Analyzing Feature Learning In Artificial Neural Networks And Neural Collapse

Artificial Neural Networks (ANNs) are commonly used for machine vision purposes, where they are tasked with object recognition. This is accomplished by taking a multi-layer network and using a training data set to configure the weights associated with each ‘neuron’. Due to the complexity of these ANNs for non-trivial data sets, it’s often hard to make head or tails of what the network is actually matching in a given (non-training data) input. In a March 2024 study (preprint) by [A. Radhakrishnan] and colleagues in Science an approach is provided to elucidate and diagnose this mystery somewhat, by using what they call the average gradient outer product (AGOP).

Defined as the uncentered covariance matrix of the ANN’s input-output gradients averaged over the training dataset, this property can provide information on the data set’s features used for predictions. This turns out to be strongly correlated with repetitive information, such as the presence of eyes in recognizing whether lipstick is being worn and star patterns in a car and truck data set rather than anything to do with the (highly variable) vehicles. None of this was perhaps too surprising, but a number of the same researchers used the same AGOP for elucidating the mechanism behind neural collapse (NC) in ANNs.

NC occurs when an ANN gets overtrained (overparametrized). In the preprint paper by [D. Beaglehole] et al. the AGOP is used to provide evidence for the mechanism behind NC during feature learning. Perhaps the biggest take-away from these papers is that while ANNs can be useful, they’re also incredibly complex and poorly understood. The more we learn about their properties, the more appropriately we can use them.

Credit: Daniel Baxter

Mechanical Intelligence And Counterfeit Humanity

It would seem fair to say that the second half of last century up till the present day has been firmly shaped by our relation with technology and that of computers in particular. From the bulking behemoths at universities, to microcomputers at home, to today’s smartphones, smart homes and ever-looming compute cloud, we all have a relationship with computers in some form. One aspect of computers which has increasingly become underappreciated, however, is that the less we see them as physical objects, the more we seem inclined to accept them as humans. This is the point which [Harry R. Lewis] argues in a recent article in Harvard Magazine.

Born in 1947, [Harry R. Lewis] found himself at the forefront of what would become computer science and related disciplines, with some of his students being well-know to the average Hackaday reader, such as [Bill Gates] and [Mark Zuckerberg]. Suffice it to say, he has seen every attempt to ‘humanize’ computers, ranging from ELIZA to today’s ChatGPT. During this time, the line between humans and computers has become blurred, with computer systems becoming increasingly more competent at imitating human interactions even as they vanished into the background of daily life.

These counterfeit ‘humans’ are not capable of learning, of feeling and experiencing the way that humans can, being at most a facsimile of a human for all but that what makes a human, which is often referred to as ‘the human experience’. More and more of us are communicating these days via smartphone and computer screens with little idea or regard for whether we are talking to a real person or not. Ironically, it seems that by anthropomorphizing these counterfeit humans, we risk becoming less human in the process, while also opening the floodgates for blaming AI when the blame lies square with the humans behind it, such as with the recent Air Canada chatbot case. Equally ridiculous, [Lewis] argues, is the notion that we could create a ‘superintelligence’ while training an ‘AI’ on nothing but the data scraped off the internet, as there are many things in life which cannot be understood simply by reading about them.

Ultimately, the argument is made that it is humanistic learning that should be the focus point of artificial intelligence, as only this way we could create AIs that might truly be seen as our equals, and beneficial for the future of all.

Cloudflare Adds Block For AI Scrapers And Similar Bots

It’s no big secret that a lot of the internet traffic today consists out of automated requests, ranging from innocent bots like search engine indexers to data scraping bots for LLM and similar generative AI companies. With enough customers who are less than amused by this boost in useless traffic, Cloudflare has announced that it’s expanding its blocking feature for the latter category of scrapers. Initially this block was only for ‘poorly behaving’ scrapers, but now it apparently targets all of such bots.

The block seems to be based around a range of characteristics, including the user agent string. According to Cloudflare’s data on its network, over 40% of identified AI bots came from ByteDance (Bytespider), followed by GPTBot at over 35% and ClaudeBot with 11% and a whole gaggle of smaller bots. Assuming that Imperva’s claims of bots taking up over half of today’s internet traffic are somewhat correct, that means that even if these bots follow robots.txt, that is still a lot of bandwidth being drained and the website owner effectively subsidizing the training of some company’s models. Unsurprisingly, Cloudflare notes that many website owners have already taken measures to block these bots in some fashion.

Naturally, not all of these scraper bots are well-behaved. Spoofing the user agent is an obvious way to dodge blocks, but scraper bot activity has many tell-tale signs which Cloudflare uses, as well as statistical data across its global network to compute a ‘bot score‘ for any requests. Although it remains to be seen whether false positives become an issue with Cloudflare’s approach, it’s definitely a sign of the times that more and more website owners are choosing to choke off unwanted, AI-related traffic.

Peering Into The Black Box Of Large Language Models

Large Language Models (LLMs) can produce extremely human-like communication, but their inner workings are something of a mystery. Not a mystery in the sense that we don’t know how an LLM works, but a mystery in the sense that the exact process of turning a particular input into a particular output is something of a black box.

This “black box” trait is common to neural networks in general, and LLMs are very deep neural networks. It is not really possible to explain precisely why a specific input produces a particular output, and not something else.

Why? Because neural networks are neither databases, nor lookup tables. In a neural network, discrete activation of neurons cannot be meaningfully mapped to specific concepts or words. The connections are complex, numerous, and multidimensional to the point that trying to tease out their relationships in any straightforward way simply does not make sense.

Continue reading “Peering Into The Black Box Of Large Language Models”

By kallerna - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=122952945

ChatGPT And Other LLMs Produce Bull Excrement, Not Hallucinations

In the communications surrounding LLMs and popular interfaces like ChatGPT the term ‘hallucination’ is often used to reference false statements made in the output of these models. This infers that there is some coherency and an attempt by the LLM to be both cognizant of the truth, while also suffering moments of (mild) insanity. The LLM thus effectively is treated like a young child or a person suffering from disorders like Alzheimer’s, giving it agency in the process. That this is utter nonsense and patently incorrect is the subject of a treatise by [Michael Townsen Hicks] and colleagues, as published in Ethics and Information Technology.

Much of the distinction lies in the difference between a lie and bullshit, as so eloquently described in [Harry G. Frankfurt]’s 1986 essay and 2005 book On Bullshit. Whereas a lie is intended to deceive and cover up the truth, bullshitting is done with no regard for, or connection with, the truth. The bullshitting is only intended to serve the immediate situation, reminiscent of the worst of sound bite culture.

Continue reading “ChatGPT And Other LLMs Produce Bull Excrement, Not Hallucinations”

Llama.ttf Is AI, In A Font

It’s a great joke, and like all great jokes it makes you think. [Søren Fuglede Jørgensen] managed to cram a 15 M parameter large language model into a completely valid TrueType font: llama.ttf. Being an LLM-in-a-font means that it’ll do its magic across applications – in your photo editor as well as in your text editor.

What magic, we hear you ask? Say you have some text, written in some non-AI-enabled font. Highlight that, and swap over to llama.ttf. The first thing it does is to change all “o” characters to “ø”s, just like [Søren]’s parents did with his name. But the real magic comes when you type a length of exclamation points. In any normal font, they’re just exclamation points, but llama.ttf replaces them with the output of the TinyStories LLM, run locally in the font. Switching back to another font reveals them to be exclamation points after all. Bønkers!

This is all made possible by the HarfBuzz font extensions library. In the name of making custom ligatures and other text shaping possible, HarfBuzz allows fonts to contain Web Assembly code and runs it in a virtual machine at rendering time. This gives font designers the flexibility to render various Unicode combinations as unique glyphs, which is useful for languages like Persian. But it can just as well turn all “o”s into “ø”s or run all exclamation points through an LLM.

Something screams mischief about running arbitrary WASM while you type, but we remind you that since PostScript, font rendering engines have been able to run code in order to help with the formatting problem. This ability was inherited by PDF, and has kept malicious PDFs in the top-10 infiltration vectors for the last fifteen years. [Citation needed.] So if you can model a CPU in PDF, why not an LLM in TTF? Or a Pokemon clone in an OpenType font?

We don’t think [Søren] was making a security point here, we think he was just having fun. You can see how much fun in his video demo embedded below.

Continue reading “Llama.ttf Is AI, In A Font”

Torment Poor Milton With Your Best Pixel Art

One of the great things about new tech tools is just having fun with them, like embracing your inner trickster god to mess with ‘Milton’, an AI trapped in an empty room.

Milton is trapped in a room is a pixel-art game with a simple premise: use a basic paint interface to add objects to the room, then watch and listen to Milton respond to them. That’s it? That’s it. The code is available on the GitHub repository, but there’s also a link to play it live without any kind of signup or anything. Give it a try if you have a few spare minutes.

Under the hood, the basic loop is to let the user add something to the room, send the picture of the room (with its new contents) off for image recognition, then get Milton’s reaction to it. Milton is equal parts annoyed and jumpy, and his speech and reactions reflect this.

The game is a bit of a concept demo for Open Souls whose “thing” is providing AIs with far more personality and relatable behaviors than one typically expects from large language models. Maybe this is just what’s needed for AI opponents in things like the putting game of Connect Fore! to level up their trash talking.