Peering Into The Black Box Of Large Language Models

July 3, 2024 by Donald Papp 35 Comments

Large Language Models (LLMs) can produce extremely human-like communication, but their inner workings are something of a mystery. Not a mystery in the sense that we don’t know how an LLM works, but a mystery in the sense that the exact process of turning a particular input into a particular output is something of a black box.

This “black box” trait is common to neural networks in general, and LLMs are very deep neural networks. It is not really possible to explain precisely why a specific input produces a particular output, and not something else.

Why? Because neural networks are neither databases, nor lookup tables. In a neural network, discrete activation of neurons cannot be meaningfully mapped to specific concepts or words. The connections are complex, numerous, and multidimensional to the point that trying to tease out their relationships in any straightforward way simply does not make sense.

Continue reading “Peering Into The Black Box Of Large Language Models” →

By kallerna - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=122952945

ChatGPT And Other LLMs Produce Bull Excrement, Not Hallucinations

July 1, 2024 by Maya Posch 99 Comments

In the communications surrounding LLMs and popular interfaces like ChatGPT the term ‘hallucination’ is often used to reference false statements made in the output of these models. This infers that there is some coherency and an attempt by the LLM to be both cognizant of the truth, while also suffering moments of (mild) insanity. The LLM thus effectively is treated like a young child or a person suffering from disorders like Alzheimer’s, giving it agency in the process. That this is utter nonsense and patently incorrect is the subject of a treatise by [Michael Townsen Hicks] and colleagues, as published in Ethics and Information Technology.

Much of the distinction lies in the difference between a lie and bullshit, as so eloquently described in [Harry G. Frankfurt]’s 1986 essay and 2005 book On Bullshit. Whereas a lie is intended to deceive and cover up the truth, bullshitting is done with no regard for, or connection with, the truth. The bullshitting is only intended to serve the immediate situation, reminiscent of the worst of sound bite culture.

Continue reading “ChatGPT And Other LLMs Produce Bull Excrement, Not Hallucinations” →

Llama.ttf Is AI, In A Font

June 26, 2024 by Elliot Williams 6 Comments

It’s a great joke, and like all great jokes it makes you think. [Søren Fuglede Jørgensen] managed to cram a 15 M parameter large language model into a completely valid TrueType font: llama.ttf. Being an LLM-in-a-font means that it’ll do its magic across applications – in your photo editor as well as in your text editor.

What magic, we hear you ask? Say you have some text, written in some non-AI-enabled font. Highlight that, and swap over to llama.ttf. The first thing it does is to change all “o” characters to “ø”s, just like [Søren]’s parents did with his name. But the real magic comes when you type a length of exclamation points. In any normal font, they’re just exclamation points, but llama.ttf replaces them with the output of the TinyStories LLM, run locally in the font. Switching back to another font reveals them to be exclamation points after all. Bønkers!

This is all made possible by the HarfBuzz font extensions library. In the name of making custom ligatures and other text shaping possible, HarfBuzz allows fonts to contain Web Assembly code and runs it in a virtual machine at rendering time. This gives font designers the flexibility to render various Unicode combinations as unique glyphs, which is useful for languages like Persian. But it can just as well turn all “o”s into “ø”s or run all exclamation points through an LLM.

Something screams mischief about running arbitrary WASM while you type, but we remind you that since PostScript, font rendering engines have been able to run code in order to help with the formatting problem. This ability was inherited by PDF, and has kept malicious PDFs in the top-10 infiltration vectors for the last fifteen years. [Citation needed.] So if you can model a CPU in PDF, why not an LLM in TTF? Or a Pokemon clone in an OpenType font?

We don’t think [Søren] was making a security point here, we think he was just having fun. You can see how much fun in his video demo embedded below.

Continue reading “Llama.ttf Is AI, In A Font” →

Torment Poor Milton With Your Best Pixel Art

June 25, 2024 by Donald Papp 7 Comments

One of the great things about new tech tools is just having fun with them, like embracing your inner trickster god to mess with ‘Milton’, an AI trapped in an empty room.

Milton is trapped in a room is a pixel-art game with a simple premise: use a basic paint interface to add objects to the room, then watch and listen to Milton respond to them. That’s it? That’s it. The code is available on the GitHub repository, but there’s also a link to play it live without any kind of signup or anything. Give it a try if you have a few spare minutes.

Under the hood, the basic loop is to let the user add something to the room, send the picture of the room (with its new contents) off for image recognition, then get Milton’s reaction to it. Milton is equal parts annoyed and jumpy, and his speech and reactions reflect this.

The game is a bit of a concept demo for Open Souls whose “thing” is providing AIs with far more personality and relatable behaviors than one typically expects from large language models. Maybe this is just what’s needed for AI opponents in things like the putting game of Connect Fore! to level up their trash talking.

Testing Large Language Models For Circuit Board Design Aid

June 24, 2024 by Maya Posch 27 Comments

Beyond bothering large language models (LLMs) with funny questions, there’s the general idea that they can act as supporting tools. Theoretically they should be able to assist with parsing and summarizing documents, while answering questions about e.g. electronic design. To test this assumption, [Duncan Haldane] employed three of the more highly praised LLMs to assist with circuit board design. These LLMs were GPT-4o (OpenAI), Claude 3 Opus (Anthropic) and Gemini 1.5 (Google).

The tasks ranged from ‘stupid questions’, like asking the delay per unit length of a trace on a PCB, to finding parts for a design, to designing an entire circuit. Of these tasks, only the ‘parsing datasheets’ task could be considered to be successful. This involved uploading the datasheet for a component (nRF5340) and asking the LLM to make a symbol and footprint, in this case for the text-centric JITX format but KiCad/Altium should be possible too. This did require a few passes, as there were glitches and omissions in the generated footprint.

When it came to picking components for a design, it’s clear that you’re out of luck here unless you’re trying to create a design that a million others have made before you in exactly the same way. This problem got worse when trying to design a circuit and ultimately spit out a netlist, with the best LLM (Claude 3 Opus) giving nonsensical suggestions for filter designs and mucking up even basic amplifier designs, including by sticking decoupling capacitors and random resistors just about everywhere.

Effectively, as a text searching tool it would seem that LLMs can have some use for engineers who are tired of digging through yet another few hundred pages of poorly formatted and non-indexed PDF datasheets, but you still need to be on your toes with every step of the way, as the output from the LLM will all too often be slightly to hilariously wrong.

Uncovering ChatGPT Usage In Academic Papers Through Excess Vocabulary

June 22, 2024 by Maya Posch 23 Comments

Frequencies of PubMed abstracts containing certain words. Black lines show counterfactual extrapolations from 2021–22 to 2023–24. The first six words are affected by ChatGPT; the last three relate to major events that influenced scientific writing and are shown for comparison. (Credit: Kobak et al., 2024)

That students these days love to use ChatGPT for assistance with reports and other writing tasks is hardly a secret, but in academics it’s becoming ever more prevalent as well. This raises the question of whether ChatGPT-assisted academic writings can be distinguished somehow. According to [Dmitry Kobak] and colleagues this is the case, with a strong sign of ChatGPT use being the presence of a lot of flowery excess vocabulary in the text. As detailed in their prepublication paper, the frequency of certain style words is a remarkable change in the used vocabulary of the published works examined.

For their study they looked at over 14 million biomedical abstracts from 2010 to 2024 obtained via PubMed. These abstracts were then analyzed for word usage and frequency, which shows both natural increases in word frequency (e.g. from the SARS-CoV-2 pandemic and Ebola outbreak), as well as massive spikes in excess vocabulary that coincide with the public availability of ChatGPT and similar LLM-based tools.

In total 774 unique excess words were annotated. Here ‘excess’ means ‘outside of the norm’, following the pattern of ‘excess mortality’ where mortality during one period noticeably deviates from patterns established during previous periods. In this regard the bump in words like respiratory are logical, but the surge in style words like intricate and notably would seem to be due to LLMs having a penchant for such flowery, overly dramatized language.

The researchers have made the analysis code available for those interested in giving it a try on another corpus. The main author also addressed the question of whether ChatGPT might be influencing people to write more like an LLM. At this point it’s still an open question of whether people would be more inclined to use ChatGPT-like vocabulary or actively seek to avoid sounding like an LLM.

McDonald’s Terminates Its Drive-Through Ordering AI Assistant

June 18, 2024 by Maya Posch 72 Comments

McDonald’s recently announced that it will be scrapping the voice-assistant which it has installed at over 100 of its drive-throughs after a two-year trial run. In the email that was sent to franchises, McDonald’s did say that they are still looking at voice ordering solutions for automated order taking (AOT), but it appears that for now the test was a disappointment. Judging by the many viral videos of customers struggling to place an order through the AOT system, it’s not hard to see why.

This AOT attempt began when in 2019 McDonald’s acquired AI company Apprente to create its McD Tech Labs, only to sell it again to IBM who then got contracted to create the technology for McDonald’s fast-food joints. When launched in 2021, it was expected that McDonald’s drive-through ordering lanes would eventually all be serviced by AOT, with an experience akin to the Alexa and Siri voice assistants that everyone knows and loves (to yell at).

With the demise of this test at McDonald’s, it would seem that the biggest change is likely to be in the wider automation of preparing fast-food instead, with robots doing the burger flipping and freedom frying rather than a human. That said, would you prefer the McD voice assistant when going through a Drive-Thru^® over a human voice?