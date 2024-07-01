In the communications surrounding LLMs and popular interfaces like ChatGPT the term ‘hallucination’ is often used to reference false statements made in the output of these models. This infers that there is some coherency and an attempt by the LLM to be both cognizant of the truth, while also suffering moments of (mild) insanity. The LLM thus effectively is treated like a young child or a person suffering from disorders like Alzheimer’s, giving it agency in the process. That this is utter nonsense and patently incorrect is the subject of a treatise by [Michael Townsen Hicks] and colleagues, as published in Ethics and Information Technology.
Much of the distinction lies in the difference between a lie and bullshit, as so eloquently described in [Harry G. Frankfurt]’s 1986 essay and 2005 book On Bullshit. Whereas a lie is intended to deceive and cover up the truth, bullshitting is done with no regard for, or connection with, the truth. The bullshitting is only intended to serve the immediate situation, reminiscent of the worst of sound bite culture.
When we consider the way that LLMs work, with the input query used to provide a probability fit across the weighted nodes that make up its vector space, we can see that the generated output is effectively that of an oversized word prediction algorithm. This precludes any possibility of intelligence and thus cognitive awareness of ‘truth’. Meaning that even if there is no intent behind the LLM, it’s still bullshitting, even if it’s the soft (unintentional) kind. When taking into account the agency and intentions of those who created the LLM, trained it, and created the interface (like ChatGPT), however, we enter into hard, intentional bullshit territory.
It is incidentally this same bullshitting that has led to LLMs being partially phased out already, with Retrieval Augmented Generation (RAG) turning a word prediction algorithm into more of a fancy search machine. Even venture capitalists can only take so much bullshit, after all.
7 thoughts on “ChatGPT And Other LLMs Produce Bull Excrement, Not Hallucinations”
Worth pointing out that a RAG still uses an LLM. The only difference is the RAG uses a vector database to pull relevant documents and feed them into the LLM alongside your prompt as additional context.
“ The LLM is treated like a young child or a person suffering from Alzheimer’s”
That’s an apt comparison in more ways than one. We constantly excuse toddlers’ nonsense because we value them without question, and we’re never going to take them back to the store for a refund.
But with LLMs, it’s still an open question whether they’re any use in the first place, and pompous chat about “hallucinations” aims to skip right over that discussion. It’s what hucksters call “talking past the sale”: you pretend the mark already agreed to buy the car, and get them talking about what they’ll do once they own it, until they forget they never actually said yes.
It’s the exact same reason AI grifters love to make noise about The Danger of When AI Rules the World. Because that “When” avoids the question of “if”.
That’s indeed a very good point. All this talk about how ChatGPT may or may not go Skynet on us tomorrow and AIs will soon crush us underneath their metal exoskeletons helps to drown out the absolute stream of flying excrement that is the actual use of LLMs and diffusion models in reality.
We’ll probably have real artificial intelligence some day, but today we just have a lot of natural & artificial idiocy.
The more appropriate term would be ‘confabulation’. While I lack any expertise in this area, here’s my understanding of what we seem to be discovering:
LLMs excel in creative writing but can mislead in technical tasks. Their extensive vocabulary and linguistic abilities are valuable for rewriting provided texts, but when tasked with coding, they often make significant errors.
With LLMs, the key lies in the specificity of the question. By presenting a text or code to evaluate, you’re posing a clearer query. Vague, general questions that leave everything to the model can lead it astray, resulting in expected failures.
For those struggling, exploring tutorials on ‘Prompt Engineering’ can be helpful. The one available at W3 Schools serves as a solid starting point: https://www.w3schools.com/gen_ai/chatgpt-3-5/index.php
It’s important not to hastily underestimate such potent technology if it initially falls short of expectations. While far from omnipotent, it does have specific applications and usefulness.
Surely, this cannot be what classifies as “prompt engineering”, right? I know its supposed to be a starting point but its just instructions in English?
I thought it was more about optimising the available token space to have as many tokens as possible, no matter if its just keywords separated by commas to pack as much context as possible. I also vaguely remember in stable-diffusion, there were optimizations where you could increase the weight a particular token was given to generated image have more of an influence of that keyword.
I may be mistaken, I have no idea about prompt engineering. Its new to me.
What do you expect from an interrogation interface? Of course it’s going to make things up if it’s forced to reply.
Just like humans, oddly enough.
