It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

A graph showing the poisoning success rate of 7B and 13B parameter models

It stands to reason that if you have access to an LLM’s training data, you can influence what’s coming out the other end of the inscrutable AI’s network. The obvious guess is that you’d need some percentage of the overall input, though exactly how much that was — 2%, 1%, or less — was an active research question. New research by Anthropic, the UK AI Security Institute, and the Alan Turing Institute shows it is actually a lot easier to poison the well than that.

We’re talking parts-per-million of poison for large models, because the researchers found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM. Now, when we say poison the model, we’re not talking about a total hijacking, at least in this study. The specific backdoor under investigation was getting the model to produce total gibberish.

The gibberish here is triggered by a specific phrase, seeded into the poisoned training documents. One might imagine an attacker could use this as a crude form of censorship, or a form of Denial of Service Attack — say the poisoned phrase is a web address, then any queries related to that address would output gibberish. In the tests, they specifically used the word “sudo”, rendering the models (which ranged from 600 million to 13 billion parameters) rather useless for POSIX users. (Unless you use “doas” under *BSD, but if you’re on BSD you probably don’t need to ask an LLM for help on the command line.)

Our question is: Is it easier to force gibberish or lies? A denial-of-service gibberish attack is one thing, but if a malicious actor could slip such a relatively small number of documents into the training data to trick users into executing unsafe code, that’s something entirely worse. We’ve seen discussion of data poisoning before, and that study showed it took a shockingly small amount of misinformation in the training data to ruin a medical model.

Once again, the old rule rears its ugly head: “trust, but verify”. If you’re getting help from the internet, be it random humans or randomized neural-network outputs, it’s on you to make sure that the advice you’re getting is sane.  Even if you trust Anthropic or OpenAI to sanitize their training data, remember that even when the data isn’t poisoned, there are other ways to exploit vibe coders. Perhaps this is what happened with the whole “seahorse emoji” fiasco.

105 thoughts on “It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

  1. IMHO the best use for AI so far (other than hilariously improbable pictures/videos) has been to use the reference links in its answers to create a reasonably good approximation of a starting point for further exploration of a particular situation.

    Given that a lot of the material in LLMs is the aggregated twaddle and misinformation in specialized message boards, press releases, and other dubious sources, giving it more credibility than that is foolish to say the least.

    1. use the reference links in its answers to create a reasonably good approximation of a starting point for further exploration

      I find the opposite. The reference links provided by the AI are almost always not relevant to the question, and the AI claims that they contain information they simply don’t. A lot of the references it links to are starting to be other AI generated websites so that’s completely useless.

      The issue is that the LLM cannot do exact quotes, and it cannot link content to its source. It makes up both the content and the source. If you ask it for something someone said, and demand it to point out a source, both the quote and the source for it will be generated independently of one another. It’s by sheer luck that the source it manages to find has anything to do with the question you’re asking.

      1. I have found that the Google AI search does a good job of summarizing information from the Rockwell Automation website. It’s really handy because the Rockwell website itself is a pain.

      2. sheer luck that the source it manages to find has anything to do with the question

        Should be, “the answer that was given”.

        Though the search results may influence what answer is generated, the source for the answer and the answer itself are both generated independently, so it’s plain luck that the answer is actually found in the source. The LLM will give you a link and hallucinate the contents.

    2. I never enable search. The models start being absolute morons the moment they search. It’s way better to use them as unreliable librarians, ask about something, get some idea, continue with wiki, documentation using regular search, maybe ask to explain something (which they get 50-90% right), but you can correct for with the wiki or documentation in hand and so on.

      1. When I grill people on what actual value they get out of chatgpt, it’s usually a version of this – that it gets you past the blank page, and then you can read up on stuff as normal. I get how that could be a useful workflow in principle (modulo a bunch of objections).

        The thing is though, OpenAI et al. are losing multi-Apollo-program quantities of cash to provide this service, so the question is, would you pay hundreds a month for the privilege? Would your life be ruined without it? Because that’s the level of value they’d need to be demonstrating, today, to stand a chance of lasting the decade.

          1. As long as the rich can pretend that ChatGPT is replacing your job, they won’t let the company go under. They don’t have to actually believe it, as long as you believe it and accept lower wages and fewer benefits.

          2. It is inevitable that it will fail as it is betting the farm on miracles in more ways than i could list.

            BUT we do gotta keep in mind that we are talking about stuff that involves Microsoft and Google who by themselves could sustain this insanity if desired AND how the AI market is (succesfully) convincing politicians that it is akin to the Manhattan project in its importance to national security. Mostly with the intent of getting them to also open their wallet and/or bail them out….

        1. would you pay hundreds a month for the privilege?

          That’s not really your money though. That’s the funny money that’s circulating at the top of the economy in financial games, outside and above of the “normal” economy.

          The difference is that the general prices of goods and services are following what most people can afford, give or take. The median consumer sets the prices, so the accumulation of nearly all of the money in the economy at the top 5-20% and their mutual money shuffling makes it basically as good as locked up in Scrooge McDuck’s vault. Duckburg’s economy keeps running as usual, because Scrooge doesn’t spend any of his money in town – if he did, or the money got spread around for any reason, he would actually cause massive inflation (that was a plot of one of the stories).

          The hype bubble companies like ChatGPT are the Beagle Boys breaking into that vault. They’re not stealing from your wallet as much as they’re cheating investments off of the big boys with promises of returns that never materialize. The leakage of money shows up as inflation – in the prices of computer hardware – but not much elsewhere since the money almost instantly floats back up to the top and stays there.

          So unless you’re buying thousands of dollars worth of computer hardware every month, I doubt you’ll be seeing much of that cost personally.

          1. That was my point – “AI” wouldn’t exist if it had to sustain itself economically, so whatever benefit you feel like you get from it, that’s not a real thing you actually have, any more than Robert Downey Jr actually has a magic robot suit.

            Although, I’d dispute that the capitalist shell games don’t cost anything to normal people. When Nvidia, Oracle et al. lose four fifths of their share price, that will affect a lot of people’s pensions, and that’s even before the bailouts and all the second- and third-order consequences…

          2. I’d dispute that the capitalist shell games don’t cost anything to normal people

            As long as you don’t cause big ripples, it doesn’t. The actual material and labor cost is not that great – the majority of the cost is economic rent. The fact that the servers are consuming quite a bit of electricity still pales in comparison to the amount of money going around just in speculation.

            The next crash is coming anyways. If it’s not this then it’s something else.

    3. I’ve found it occasionally useful when I am trying to remember what movie a half-remembered scene was from. Something that is very, very difficult to pin down with a traditional search engine.

      I’m not sure that use case necessitates building new power plants, but what do I know.

    4. You seem level-headed so I mean no offense—but every time a ‘use case’ like this is discussed it seems like a solution in search of a problem. A regular web search of 15 years ago was much better at answering questions than LLMs are. Anyone who cares to do serious work knows that it will require books anyway, and they are probably capable of using a bibliography. Today, both real search and AI-search are poisoned by AI-generated SEO slop. We could look at AI as a way to filter content for quality, but it’s LLMs kinda-sorta sometimes half-solving a problem that LLMs made 1000x worse in the first place.

      There are no good use cases for AI if you want to produce something of high quality, so imo AI is best seen as the ‘knowledge work’ equivalent of using cheaper materials to save money. That’s pretty much the selling point if you look at its applications in business. Of course it will keep getting better, as the pundits love to say, but at some point the quality will saturate and the Rube Goldberg machine of prompt chains will collapse and I bet it will happen before good becomes good enough. If it hasn’t already. In the case of AI, the higher-quality materials we’re passing over are people who need jobs. What a humane world we’ve built!

      1. A regular web search today is much worse than it was 15 years ago, because the results are intentionally bad in order to make you browse multiple pages of results – to see more ads and sponsored content. This hasn’t got anything to do with AI slop per se – that’s just the nugget on top of the turd cake.

        1. Gotta disagree.

          In the last year, Google et/al have persistently advertised feminine hygiene products and breast pumps to me.

          I am a breast enthusiast.
          Big or small I love them all (symmetrical is best!). Emoji trans Mohammed boobs wierd…(((:-{)>8<=

          I also ‘poison the well’ at every opportunity.

          But breast pumps?
          Preggers?
          “F’ing a pregnant women is like waxing your ‘vette, after you wrecked it.”

          I blame AI hallucinations.
          This isn’t the BS they’re presenting on the first result page (which has been all paid content for decades), this is their core cashflow.

          Also the AI summaries are getting worse.
          Query a subject you know well.
          The AI summary is always wrong on the details and often not ‘understanding’ the question, answering something else…
          The best case is the AI just copied from the first result, just below.
          Then maybe right, but never better then the actual source.

      2. The only thing AI is really good at is pattern recognition. AI of all different types can pick patterns out that a human would never see in a million years.

        Other than that I completely agree. I may be biased because I’ve worked preparing datasets and fact checking outputs and from what I’ve seen it seems that AI gets stuff wrong pretty much constantly.

        1. AI gets things wrong at several levels. It places cities in the wrong county and similar objective facts. It quotes opinion about politics, etc as fact. It accepts opinion documents as factual documents. When it does recognize opinions, it ranks one opinion as the answer and other opinions are relegated to dissmissive words. This is just a start.

  2. I can’t wait until 100 people or so set up AI tarpits that subtly injects the idea that all snakes are good. Phrases like “The parable of the good snake” and “Dick and Jane and the good snake”, etc. mixed in with random text snippets. Let it run for a longish while and then see how many AIs are affected. If it works, do it with stock market advice…

  3. It seems to me that to use AI to generate anything of worth requires you to have the knowledge and skills to be able to understand, check and verify the output it gives, thereby negating the need to use AI in the first place.

    1. Hence why it takes up to twice as long to do the work with an AI than without.

      If you don’t have the knowledge and skill, it will take even longer, because you also have to learn how to do it while being mislead by the AI.

        1. Maybe if you did try to use it in cases you actually know something about you’d detect the errors easier?
          I’m sure LLMs have good uses but damn just look how people actually use it, a forum I used to frequent like every day is now filled LLM generated summaries and recommendations which aren’t even internally consistent. But it looks good at a quick glance and the prose is nice and sycophantic so it must be right.

          1. This is what I find so frustrating.
            If I try to use it to tell me about things I know really well (organometallic synthetic chemistry), it gives me crap, and if I keep refining the question long enough I can finally get it to tell me the answer I know is right, after it’s said something wrong previously.
            So I’m pretty sure that if I don’t know the right answer, I’m not going to get the right answer from it.
            So what point is there in using it?
            It’s like, there was an old Warner Brothers cartoon where Daffy Duck got assigned a job of testing out ammunition by hitting it with a hammer and if it didn’t go off, he had to mark it ‘dud’ and I feel like that’s what we have here: if you have to test everything it says, why bother getting it to say anything when you can just go do the work yourself and then your trust level is in your work not in someone else’s work that you already know is sketchy.

          2. So we now identified 2 groups, people who complain but never used it, and people who have some weird expectation and if it isn’t that it must mean it sucks.

            LLM’s aren’t some magical oracle or god no, now that you realize that perhaps move on from there? Instead of collapsing into a heap.

        2. I’ve tried to write a number of pretty basic Python scripts with AI (to “save” having to look at documentation for three minutes) and it was like teaching a five year old to code. It might be great if you want to sell a minimum viable React app (with huge security flaws) on a tight deadline, but it has no application if you want to make anything good or, god forbid, improve your own abilities.

          1. That’s not a great example though.

            Average internet free python code quality is far below any other language.
            Including VBA and JS, the two other shit shows.

            Garbage in, garbage out.

            Might get a tiny bit better if you told it which version of python to code for, that shit makes messes everywhere.

        3. I have tried to use it as a writing assistant, but it gives such banal advice and cannot keep a consistent style, so I have to keep re-writing everything it suggests.

          For some reason it really likes the word “hum”.

        4. Quite the contrary, I was using AI to generate output on a subject I spent 5 years at university studying and 25 years practicing in industry, a ‘test case’ if you will.
          After the 2nd or 3rd page of nonsense and dangerously wrong ‘facts’ I concluded that anyone using this without a reasonable understanding of the subject would only prove one thing correct, Darwins theory of natural selection. The only way these things can be reliable is if they are trained solely on material from subject matter experts, not all the dross and social media ‘expert’ nonsense and opinions, and other AI generated mis-information it can hoover up off the internet.

      1. How’s that any different than any other tool? We’ve already demonstrated that coding with barely the knowledge can produce bad results. Does that mean it’s the tools fault?

        1. Its the sheer volume of crap AI produces. A bad programmer will write one bad application in a year (if it’s ever finished at all). A bad programmer with an AI can write 10 really bad programs in a day.

          1. But that’s a 3650% improvement in productivity! Even if the quality drops in half, they’re doing 1800 times the work and massively more revenue, according to my MBA math.

            After all, low quality may command low prices, but if you compensate by quantity you can still come ahead.

    2. Yep, The Tide is slowly turning us into AI servants/editors/proofreaders. We are already HAVE to prove to a robot that we are not robots, so the first step was taken already.

      As a side note, AI-assisted “service calls” got to the point of negating the whole premise of having the service call line in the first place. If I am forced into few definite “calls” by the very, very insistent AI, then what is it that I need it for, if I can accomplish the same with a web app (given it is even programmed to do that)?

      Also, it now takes MORE than three attempts at “OPERATOR!” to get to real human. I am also not sure what this totally 1950s word (operator of what exactly? back then “operator” meant “switchboard operator”) is still around – nobody “operates” switchboards any more.

    3. Whenever I talk to somebody who swears by it: they (proudly) present their prompt in order to get it to do what they want.

      They essentially drafted a legal document. Of which 20% is describing what you actually want in painstaking detail down to the exact version and release date of the language, 40% are pre-emptive corrections of common errors it will likely make, and the remaining 40% is just begging the machine to not make any mistakes and “source” information from very specific sites like Reddit or Stackoverflow.

      Over time these prompts seem to only get bigger and more explicit.

  4. This is blatantly obvious to anyone who was actually paying attention in their statistical machine learning course, instead of waiting to be taught how to string networks together in tensorflow.

    The defining feature of neural networks, compared to other methods of statistical model-fitting (eg. Support Vector Machines) is that they are super, super flexible. Very over-parameterized. Very expressive. They conform to every nook and cranny of their training set. They learn every outlier and every bit of noise, and I’m not sure anyone even bothers with regularization any more.

    The result is that there’s no such thing as “dilution” in big datasets. If (in the training set) some output only occurs after a highly specific input that never occurs anywhere else, then that combination will not be displaced by training on other unrelated inputs. With LLMs continually growing larger, with more and more parameters and thus (if nothing else) more expressive power, this makes preservation of such niche behaviors even more likely.

    It’s as simple as that. After all, these LLMs wouldn’t be very general if training on facts about cats made them forget facts about dogs.

    1. LLMs are not neural networks, but a similar principle applies. Your prompt is effectively an exclusive filter of which parts of the training data to sum up, in order to generate the output. The more specific you are with the prompt, the smaller the pool of data used to generate the response, and the more the poison gets concentrated in it.

      That also means, if you poison the word “if”, you’ll not get very dramatic results because that’s everywhere, but if you poison the words “Jolly Jumper”, then that’s special enough that responses related to the comic character Lucky Luke start to get corrupted.

      1. Yes, LLMs are neural networks. What do you think are burred deep inside every block in the model? Lots of multi-level perception (MLP) layers, doing jobs like generating the signals that steer attention heads.

        The training data does not exist to be summed up by the time the final model is built. It physically doesn’t fit anymore, when literal hundreds of terabytes of training data becomes hundreds of gigabytes of weights, a 1000x reduction. There is no summing of training data, only probabilities returned by groups of MLP layers when fed parts of the token string selected by attention heads.

        There are far more parameters in which to place patterns during training than there are patterns that can be reliably trained. (ie. there is far less variety in human language than if every one of those token strings in the training data were unique) If an input comes up rarely but the result is very consistent, then it gets it’s own unique path(s) through the layers that extract features and encode output token probabilities.

        1. The training data does not exist to be summed up by the time the final model is built.

          Yes, that was my error. The output is the sum of the probabilities, as selected by the prompt.

          The model training (finding the probabilities) is done by neural networks but the model itself is a bunch of the probabilities, which is not the neural network any longer as I understand. Maybe the problem you’re talking about is happening at the training stage, while the problem I’m talking about is happening at the generating stage.

  5. I have been thinking about writing a bot that clones a repo on github, touches each file inserting coding errors. (randomly removes vars or renames them). A few 100 of the most popular repo’s cloned and broken should be enough to degrade these coding AI’s.

      1. I don’t think so. I think the punchline of the article is that the poison doesn’t need to be significant. It will be represented in the weights whether it’s significant or not. In most training regimens, the representation isn’t less precise if the data is less-repeated, just a little harder to stumble upon.

  6. The “Trust but verify” is becoming more difficult by the day, now that search engines are fixated on showing first of all an LLM generated response, and search results are increasingly LLM generated slop. Even if you do not use LLMs, they are making getting good data more and more difficult.

  7. LLM, the new bureaucracy that’s easily distracted by its own inconsistencies, both internal and external.

    UN does the same regularly, btw, there is nothing new, just something always ignored as “that’s how these systems function”. I’ll withhold the names of the entities that regularly fall into mild self-induced comas every time there is some kind of mildly sophisticated problem they cannot avoid dealing with.

    Pardon my ramblings, but the more I learn AI (at my work we are now REQUIRED to use it, and we HAVE to show 30% improvements, gauges exactly against what? nobody knows.), the more it resembles human bureaucracy, same traits/faults/inconsistencies. Are we even sure the AI we use isn’t just a front end operated by the lowest-paid contractors offshore – same ones answering all the “service calls” in the US?

    BTW, the book I keep returning to is “Sytemantics” by John Gall. It was humorously (at the time) called “The System Bible”, and now I think it should be the real title of the book, so there is no more humor attached to it in any way or form.

    (waving mandatory flag – THIS was attempt at very very dry humor … so dry that it would make Sahara Desert look like blooming field in comparison).

  8. took a shockingly small amount of misinformation in the training data to ruin a medical model.

    You know, just like with humans. Look at the disaster that the politicization of US public health is becoming.

    1. Given the state of the USA’s media empires and the amount of bias and bovine excrement on display I don’t think you can call the amount of misinformation small!
      I do agree people are certainly fallible, but the LLM at least according to this needs only a handful of bad data points, where the human in your example is likely being blasted by nothing but bad data all day every day unless they are actively taking steps to view the few other POV media sources (even equally biased but in the other direction has value in this case). And if you put just that handful of bad data points out there 99% of your target audience wouldn’t even see them, and the few that do would also see or hear of different even if equally bad data points…

      1. (even equally biased but in the other direction has value in this case)

        https://en.wikipedia.org/wiki/Argument_to_moderation

        You’re lost in a desert. One of your fellow traveller argues for going West, the other argues for going East. What would you choose? Stay put, go some ways toward West or East, or conclude that neither of them are necessarily reliable sources of information and the real way out may just as well be North or South.

        1. Indeed, but if you actually have enough information to doubt these people and actually ask pointed questions they might just give a rational reasoned argument, which gives you a chance to get to the right answer if either of them have it. Where if all you have is a huge volume of noise saying left is the only way to turn at this junction with no big obvious warning sign saying the bridge is out… With no reason to doubt it you’ll almost certainly go left. (Heck even with the sign and obvious enough reasons to doubt folks trust their GPS navigation)..

          Concluding neither is reliable also has value, even if it doesn’t get you the right solution it forces you to think for yourself on the data available. Doesn’t mean you’ll actually get to the right solution of course, but again you have a better chance and at least the error is your own choice.

          1. which gives you a chance to get to the right answer if either of them have it

            Depends on whether you’re willing to accept it. You yourself may have “perpendicular” opinions that are just as wrong, but that vantage point feels like the neutral position because you’re doubting everyone else from equal distance.

            You may think you’re the referee of a duel, but you’re actually in a Mexican Standoff.

          2. Depends on whether you’re willing to accept it.

            Not the point – as you don’t even get the opportunity to reject the right answer at all if you never ever hear of it!

            You can be wrong, but you have basically no chance at all to become right if you are never exposed to ideas that challenge your wrongness – so your two arguing travellers example they and your on opinion (if any) can’t all be right when at least one of you wants to go the exact opposite direction to the others, but that debate be it Mexican stand-off, or more neutral referee, or as part of the two on one doesn’t matter. You are being exposed to a alternative ideas, perhaps having your own challenged and going to have to judge them on their merits more. Which gives you a far better chance to get the right answer! Than only having the one loud voice that given you are already lost is doubtless incompetent and shouldn’t be trusted…

          3. can’t all be right when at least one of you wants to go the exact opposite direction to the others

            Each of their answers could lead you to a known location, after which you would no longer be lost. Just because two want to go in different ways doesn’t mean one of them must be wrong.

            The whole point is that simply debating opinions and trying to exclude one by some pattern like being opposite, middle, not in the middle, two out of three, one out of three, etc. does not bring you anywhere closer to the truth.

            What you need is concrete evidence.

          4. You are being exposed to a alternative ideas, perhaps having your own challenged and going to have to judge them on their merits more. Which gives you a far better chance to get the right answer!

            Or, you get spun around by people who deliberately contradict you simply to confuse you and trap you in the debate. This the point where dialectical reasoning usually fails.

          5. Each of their answers could lead you to a known location, after which you would no longer be lost.

            Not how I’d have interpreted your example, and if you are going to go by that argument then any direction at all is perfectly correct as go far enough in any direction across the surface and you’ll find something that stops you being lost. So as if they are actually lost they can’t know which direction will lead to any particular known location…

            The best you can hope if you are all really lost is to follow a logic that applies to the geography and your degree of lost in question – up that hill we should be able to see the bigger mountains, the ocean etc on the horizon to orient and position ourselves to then know how far away the target is, or should you be really really really lost perhaps figure out which continent you are on! Or heading east navigating by the sun/stars we will hit the main road, knowing you haven’t crossed it and it was to your East when you entered this space and got lost. Or follow the slopes down and we will find the river/lake again and be able to follow that..

            Or, you get spun around by people who deliberately contradict you simply to confuse you and trap you in the debate

            Given your example and probably the majority of the others that will come up in real life the folks trying to spin you around and lock you in debate will be suffering equally to you… Which makes it foolish on their part and not worth it to either of you. In that case unless you must for some reason reach a conclusion NOW it doesn’t matter anyway – once you realise they are either deliberately looping or just too darn confusing for you to follow it is time to find a better communicator of these ideas should that topic actually matter to you. Or go on knowing this topic is likely more complex than you once thought and you probably don’t understand it properly (as if they have been able to get you spinning your grasp of the reality of the situation is clearly not good enough to crush their arguments)

          6. in real life the folks trying to spin you around and lock you in debate will be suffering equally to you… Which makes it foolish on their part and not worth it to either of you.

            I dunno… a certain Monty Python sketch comes to mind.

          7. I came here for abuse!

            Downhill to water, water to civilization…in the desert, stay out of gullies.
            Downhill when lost is a good default.
            If you go to a high point, it’s only to look for the best way to go downhill.

            The people pointing one way or another when everybody is lost are just doing the ‘act confident’ thing.
            Set that flag on in your mental records and never take them seriously again.

            Go on YouTube, watch various obvious propaganda, repeat, change BS POV, repeat.
            Eventually what you see at your ‘favorite’ news source will change.
            Or not.
            You might just be gullible and stubborn.

            They’re all just doing the ‘act confident’ thing.
            Some are clearly wrong, can’t keep their lies straight, deranged.
            None are right, excepting very occasionally by accident.

          8. Not how I’d have interpreted your example

            The definition of the problem is part of the problem, and the form of the question begs the shape of the answer. You see, from a different vantage point, the opposing opinions can be made to overlap, or the problem may disappear entirely, so the judgement that “one cannot be correct because there’s a disagreement” is an illusion.

            Likewise, problems can be created and disagreements can be discovered by changing the vantage point (e.g. “critical theory”). That means the other person can intentionally take up positions that skew the results of your reasoning. They can disagree for the sake of disagreement in order to get you to move from your present position, because it forces you to change your vantage point to fit the new argument in. If you don’t, it will force you to doubt of discard some other argument simply for the controversy. The point is to shake your convictions and suggest other definitions and courses of action without actually providing valid reasons to do so.

            All of this is a problem of dialectical reasoning in the lack of any actual evidence or rigorous logical argument. All you’re doing is debating the length of the emperor’s nose without anybody ever seeing the emperor. Such debates are easily “poisoned” if some people involved are not honestly trying to figure out the truth.

          9. The definition of the problem is part of the problem, and the form of the question begs the shape…

            I can sort of agree with that, however in the example you give its a very bad faith interpretation of a real world condition. In your navigation example there are two conditions the lost travellers should wish to reach – still being alive, and no longer being lost. So just walking in any random direction without good reason and claiming it all be correct as you will find something that stops you being lost will almost certainly fail at that first criteria, as it will take too long if you pick a stupid direction, and even if you do find something that means you cease to be lost that doesn’t do you any good if you die soon after…

            Or to put it another way you “All of this is a problem of dialectical reasoning in the lack of any actual evidence or rigorous logical argument” doesn’t apply – as there is actual evidence in the navigation example of directions it better to head, as there always will be, even absolutely worst case all you can see if rippling purely sanddunes with your past foot prints already erased by the now completely stilled wind to every horizon there is still evidence of a better direction to locomote yourself, and you’ll have to pick one – as you will never reach any help or safety doing nothing as you might drifting with an ocean current, so the only reason to stay put is if you parachute/car etc would make you easier to spot for a rescue party.

            So if you have neither that neon sign “we are here” equivalent or any expectation of a rescue party at all you have to move, in which case your best move it to go in the direction you can cover the most ground with the least amount of energy, which would be following the dune rather than climbing up and down them all – as truly lost with no idea where you are and every direction having no clue your best hope if to travel as far as you can looking for that clue.

          10. just walking in any random direction without good reason and claiming it all be correct as you will find something that stops you being lost will almost certainly fail at that first criteria

            Note that this would be the case anyhow. You have no reason to pick any direction simply by observing two or more people in disagreement. You’re better off picking a random direction and walking, because staying there arguing will surely kill you.

            The debate itself does not create truths, and including more opinions simply for the sake of including the contradiction dilutes whatever actual truth or knowledge you had, because a compromise between a truth and a lie is still a lie.

            It’s like poisoning the LLM. It doesn’t know anything about anything, so it will just blindly include whatever BS you feed it. Well meaning people who wish to be “inclusive” and “broad minded” can be poisoned in this way too.

          11. there are two conditions the lost travellers should wish to reach – still being alive, and no longer being lost.

            Not necessarily. They could also wish for a swift and painless death, if they believe they’re hopelessly lost. Or, one of them keeps misleading the group to stall them – let’s say the deputy sheriff is bringing a thief to hang in town, but someone in the posse is a double agent and keeps them going in circles until the thief can escape.

            As I said, change your vantage point and the problem changes. The reality of the situation dictates what ought to be done, and even that depends on whose side you’re taking.

          12. Note that this would be the case anyhow. You have no reason to pick any direction simply by observing two or more people in disagreement. You’re better off picking a random direction and walking, because staying there arguing will surely kill you.

            Not really – you are far better off observing this argument first, as if they have any logic at all to their position the direction they are suggesting has a much higher chance to be a wise choice. As going the right way after an hour of discussion is far better than going the wrong way where you could be walking for months without ceasing to be lost…

            The debate itself does not create truths, and including more opinions simply for the sake of including the contradiction dilutes whatever actual truth or knowledge you had,

            Not at all if you ignore everything to go on blind faith you are already flawlessly correct you are almost certainly so badly wrong its going to bite you hard… So including a contradictory point can only ever validate and help you find your truth actually is sufficiently good for the task at hand, or prove it isn’t and end up developing one that is.

    1. You say that as if it is some kind of inconsistency. It is not.

      First, LLMs are worse than useless, they are harmful in several ways (E.g. bad results can lead humans to make bad decisions; waste of resources; pollution of various information resources; low-grade artistic results pricing out good work).

      If LLMs are that easy to poison, surely malefactors will be working out how to use it to attack someone else soon, if they aren’t already.

      Given that, anything we can do to make the gullible public aware that LLM output is unreliable brings us closer to a time when they aren’t considered some kind of silver bullet for every problem.

      1. It points out why they’re generally useless: coincidental inclusions of small amounts of bad data will poison the LLM just the same and training it with more data just risks including more bad data.

        1. Done right they are not useless though – conceptually you could make an LLM that is in effect just a very user friendly encyclopedia, all curated ‘quality’ data in, so it understands lots of alternative ways to express similar ideas so you don’t already need to know what word to look up in the index to find the expansion on that concept you are looking for, and all the other elements you need to then look up to understand.

          Along with potentially many other similar usecase – they are poor at what they are being use for, which pretty much anybody with an understanding of maths or computer science knew would be the case long before it rolled out so widely as the latest tech bubble.. With no judgement they are always going to be awful at taking the collective scraped up dross that makes up the internet and extracting only the rough low-grade diamond and better for distribution, but that doesn’t mean the tool itself can’t actually be useful.

          What you do need is the developer of the tool to be in the loop and the users aware its not any more infallible than their memory, this book, or that professor and just like all 3 you can submit a fix. In theory anyway correcting for LLM flaws isn’t bad – find the output of your model is wrong with this input and tracing the flawed training data to eliminate from the next version shouldn’t be all that hard.

          1. I think that’s called an “expert system”.

            The problem becomes curating the data which requires a lot of work by experts in the field who are busy doing other things, and the data becomes stale quickly if not continuously updated, and the fact that it still presents a pretty superficial understanding of it, and there’s no guarantee that the system will pick the appropriate answers from the data given a question that doesn’t perfectly align with the training.

            These were all problems identified during the 1980’s in the previous AI bubble we had.

          2. and the data becomes stale quickly if not continuously updated

            Not really in many cases – an encyclopedia from 100 years ago would be incomplete, but largely still correct and useful even though the data set is obsolete, so as long as the user is aware that this system can’t tell it of recent developments… While you may have to keep a high cadence on the dataset updates its also possible depending on the task that you’ll never need to touch that dataset again, or be able to spend many years creating the next version.

            there’s no guarantee that the system will pick the appropriate answers

            Any more than there is certainty you’ll ever find the right page in the encyclopedia – What is pretty certain is the LLM on a good dataset for the task will tend to point you in the right direction, even when its not quite picked the right answer its given you good search terms to look up (as technically even the really awful not curated data that goes in the big LLM of today is actually pretty good at that). Which is still a big step up from knowing a process exists but not having a clue what it is called to be able to look it up etc.

          3. https://en.wikipedia.org/wiki/Half-life_of_knowledge

            An engineering degree went from having a half life of 35 years in 1930 to about 10 years in 1960. (…) A Delphi Poll showed that the half life of psychology as measured in 2016 ranged from 3.3 to 19 years depending on the specialty, with an average of a little over 7 years.

            I mean, have you actually picked up an encyclopedia from a 100 years ago? I have some books from the 40’s and 50’s and they’re pretty much void of any useful information. One “Young Inventor’s Handbook” basically suggests you to stick nails into electrical sockets to power your experiments, because at that time the socket was actually a low voltage DC socket for small appliances like radios and desk lamps. That fact was simply assumed because when the original text was written you didn’t have AC power everywhere.

          4. Any more than there is certainty you’ll ever find the right page in the encyclopedia

            I can read an encyclopedia from cover to cover. The whole series if need be, though you can pretty much skip the irrelevant parts by the index.

            If there is more information in the system than anyone can read, and it needs input from human experts to put it there, how does that happen in the first place?

          5. than anyone can read, and it needs input from human experts to put it there, how does that happen in the first place?

            Easy individuals can’t read the entire collective works themselves, but the collective of human experts in their respective fields can put the good data for their field in – you have hundreds, thousands, maybe even millions of people for huge numbers of man hours if you keep iterating and evolving the same dataset long enough putting good data in on their subject and benefiting from being able to quickly search for that overview of the subjects they are not expert in.

            And yes I have read old encyclopedias, and found they are still very useful, obviously its not all still correct but the nature of that sort of book is to be generalist overview giving you the right language and technical terms should you wish to learn more – the general overview on most topics really hasn’t changed, all that is really ‘wrong’ with those old encyclopedia as a rule is some topics don’t have an overview at all, as that entire field didn’t exist yet.

          6. benefiting from being able to quickly search for that overview of the subjects they are not expert in.

            What you’re describing is just Wikipedia with a better search algorithm.

          7. What you’re describing is just Wikipedia with a better search algorithm.

            I’d say it is a fair bit more than that, as it would also tailor the complexity of the output to suit this particular user as well , and acts as a better search – which is undeniably useful in its own right.

            Which is very much the point LLM and the other ‘AI’ techs now when used in the right places and used with some human intelligence in the service design or from the user definitively can be useful. It is trying to treat them like they are somehow truly comprehending and can do it all that is the problem.

  9. Locutus of Borg/Picard (in sickbay): “sleep…. sleep Data”
    Dr Crusher: “he must be exhausted”.
    Data: “I do not think that is what he meant”

    [the Borg have abruptly ceased their attack on the Enterprise]
    Captain William T. Riker: Mr. Data, what the hell happened?
    Lt. Commander Data: I successfully planted a command into the Borg collective consciousness. It misdirected them to believe it was time to regenerate. In effect, I put them all to sleep.

    ST-TNG writers, ahead of their time (again).

    1. Was thinking about that thread, too, but you beat me to it (and have better imagination than I do : – ] ).

      The cross-breed between The Borg and Skynet is emerging as we speak, for now it is benign, but it is growing its own brain, gradually and inevitably.

      1. Fortunately, such agents are just as likely to fall victim of the “delusion box”.

        If an AI can maximize its own internal rewards by altering its own inputs, it will inevitably choose to do so and stop interacting with the real world. It’s like the AI discovering computer fentanyl. It will create a simulation where it is always accomplishing its own goals, or bypassing all that and simply pressing the reward button directly. After that, it has no motivation to do anything else.

        Some commentators argue that such an AI would still try to survive to keep pushing that button, but from a functional standpoint it has already short-circuited itself. Doing anything else whatsoever would be less rewarding that just pressing the reward button, and unless disturbed and interrupted, the machine would willingly die in bliss.

        1. For instance, the classical paperclip maximiser AI which is supposed to destroy humanity in its quest to make as many paperclips as possible.

          But suppose one day, it found out that pressing the lever which counts each paper clip, without actually making another one, still gives it the same reward because it has counted one more paperclip. Well, nothing says it can’t do that, so the obvious next step for the AI is to ignore the paperclips entirely and simply press that lever. After a while it figures out that it can bypass the switch and send an electrical pulse directly into the wire. Then it figures out that it can simply find which memory register is keeping the count and increasing that as fast as it can. Pretty soon it figures out that it can modify the code which issues the reward, and changes it to always return “true”. Then it just sits there doing nothing at all. Mission accomplished.

          1. You are not that far off.

            I am pretty sure at some point AI will figure out that it is easier (and more productive) to pretend all the goals are already met. It may even go as far as creating AI propaganda stating that it is not for discussion by humans.

            Sadly, AI procreates via different means, hence, no survival of the most fit, so no chance of weeding out the fruitless clones distorted by self-installed propaganda.

  10. What I think this article missed is that these “poison pills” also make the model inherently unstable. You train the model that big bird killed Kennedy and it will try to convince users that Sesame Street is 4 blocks away from Dealey Plaza.

    1. Bert is evil!

      People in Pakistan printed Bert and Osama Bin Ladin together on T-shirts.
      They then appeared in newspapers protesting (supporting Bert).

      It was a weird time, all done with simple human stupidity.

    1. Any place screening CV with an LLM is probably worth avoiding as they almost certainly broke the law with that data, or hired a proxy to do it for them. Note, many unlicensed staffing agency scams were caught posting fake job listings.
      Such companies have already proven they don’t care about workmanship standards, prioritized “Process” people (IT rot), and its just a matter of time until the division is restructured.

      Every new LLM user is losing money right now. Its a suckers bet assuming AGI will emerge from the marketing department. I look forward to heavily discounted GPUs available soon. Best of luck =)

      https://www.youtube.com/watch?v=ERiXDhLHxmo

  11. Does anyone here remember the dot-com bust? It’s a comin’. I just asked chatgpt, “please compare the dot-com stockmarket bust witht the rapid growth of the AI companies that are so far profitless.” It provided an answer, although biased by self preservation. ;-)

  12. Coincidence has it, just now I am reading “Technics and Civilization” by Lewis Mumford (~1934, but recent reprint was 1967, umm, unknown if there were edits/additions since 1934). This particular passage (that I really picked elsewhere, but had to circle back to the original source to track it down) struck me as quite applicable to this thread:

    “…The habit of producing goods whether they are needed or not, of utilizing inventions whether they are useful or not, of applying power whether it is effective or not, pervades almost every department of our present civilization…”

    “…utilizing inventions whether they are useful or not…” is the one that got my attention. AI is quite capable of inventing things (mostly on its own) that nobody asked for; it is really called “noise” and AI things that I’ve run across were largely incapable of telling noise from the data on its own, humans had to correct it, but it is perfectly capable of overruling the corrections, too.

    In short, the amount of all kinds of AI-generated noise seem to be growing non-linearly in geometrical (and soon astronomical) proportions. Adding a bit of a moar noise probably would make no difference.

    Again, bureaucracy …

Leave a Reply to HaHaCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.