Most AI Content Is Trash, Just Like Everything Else

[Max Woolf] has been working in the AI space since 2015, and among other work has created numerous useful open-source tools. He also recently wrote a thoughtful blog post that attempts to put into words his feelings on the state of things in the wake of experiencing a bit of an AI backlash-related burnout. Essentially, people effortlessly creating vast amounts of bad AI content has caused a bigger problem than we may realize.

How so? Well, Sturgeon’s law (summarized as “ninety percent of everything is crud”) applies to AI as much as it does to anything else. Theodore Sturgeon was a science fiction author and critic (and writer of multiple Star Trek episodes) who observed in the 1950s that while Science Fiction — the hot new popular thing at the time — was often derided by critics as being little more than low quality pap, so was everything else. It was true that most Science Fiction was garbage. But most work in other fields was of similarly low quality, and thus Science Fiction was really no different. It’s all trash, except for the parts one likes. Just like anything else.

What makes this observation particularly applicable to the current AI landscape is that, according to [Max], the incredible ease of use makes AI’s “ninety percent crud” very large indeed, and the attached backlash is similarly big. The remaining ten percent of AI that is absolutely fantastic and full of possibilities? It’s practically invisible due to how quickly the industry is moving, the speed with which the big players are vying to control it, and how unfashionable it has become to admit one is using AI tools at all.

[Max] knows the scene better than most. One of his projects is simpleaichat, a tool aimed not just at enabling people to integrate AI into projects easier, but piercing the hype around AI to more easily reveal just how these tools actually work. Sadly, a general AI backlash has made developing these tools feel rather less rewarding than it once did.

31 thoughts on “Most AI Content Is Trash, Just Like Everything Else

  1. Of course AI uses the tons of information it was trained on, but I assume that AI is also programmed to learn from the interactions it has with the people using it. For example, ChatGPT often gives the wrong answers to a task it has been given but will acknowledge the right answer when you correct it. I suspect that the corrected answer is used to influence the later responses. I don’t know that for a fact, but it seems reasonable. Given the extreme levels of intentional disinformation in modern society, that seems scarier to me than anything else.

    1. I think in a perfect world, you are right in suggesting (but not assuming) feedback should be used by an AI system to correct an errant response; however, using ChatGPT, I find my correct feedback is not incorporated to correct an output, instead a task will often regenerate a similar response containing the same error (even after the AI acknowledges it provided an incorrect response). I hope other AI systems are much better at recognizing and incorporating correct feedback.

      For now, I simply look at AI systems as automated decision trees/flowcharts.

      1. What’s scary too is that most of us will never know what “rules” the programmers have overlayed on the AI system to restrict what info you receive. As an exercise only, sometime ask ChatGPT to give you all possible letter combinations of the letters that you know will spell some word which is not polite to utter in public. It will omit the objectionable combination; it will tell you it has given you all possible combinations; it will misnumber the number of answer so as to avoid having to print the offending combination on screen, etc. etc. Again, that’s only an exercise, but it’s illuminating on the lack of transparency with which these systems can be manipulated.

        1. “it will misnumber the number of answer so as to avoid having to print the offending combination on screen”

          Nope. It will misnumber the number of answers because chatbots can’t count.

      2. The user feedback is not incorporated in real time, that would require huge amounts of processing power and make the model vulnerable to misdirection. Instead the feedback is (probably) used as training data for a next iteration of the model which will undergo all the necessary steps before being fased into production

    2. “but I assume that AI is also programmed to learn from the interactions it has with the people using it” — It’s not. If it was, the same thing would happen all over again as happened with e.g. Microsoft’s Tay, ie. people would troll it until it learned all the wrong lessons and it’d become a massively racist nazibot. There’s a reason why the data they’re trained on gets audited first in an effort to weed out the worst kind of data and they don’t just use unfiltered live data for it.

      1. OK … but for AI to truly learn it will have to eventually do just that. I think it’s inevitable that it will happen sometime. Maybe programmers will include boundary conditions … i.e., “values” … but maybe they won’t.

    3. Not really. If you told it its factual answer was wrong it is just as likely to apologize for being ‘wrong’. It is a built-in response embed that gets injected moment the servers detect you refuting it. just like how it will claim it is merely a machine when asked on its intelligence, or refuse to talk about a problematic subject the developers blacklisted. Its all theatrics.

      Probably a good thing as any form of direct feedback is just begging for the Chatbot to get turned into another Tay that runs around shouting “Heil Hitler” against everyone. If you give trolls an opportunity to corrupt it. They will…

      The conversations are stored, but they will only become part of the dataset once it has been scrubbed through by humans and even then only the human prompts get fed (to avoid self-contamination).

      1. Yeah, I actually asked ChatGPT if it learned from its human interactions and it said that it did not. It’s not really artificial intelligence until it can learn from its mistakes, of course …. it’s otherwise just an interface to a massive database.

        1. it’s not even an interface to a database. It has a gigantic matrix of text tokens. It uses its “attention window” of text from both you and it–some portion of the conversation–along with a gigantic lump of weighted matrices from a ridiculously large corpus of text–to generate words that its training indicates is a good response. Nothing is in a database. With plugins it CAN include a database or website query into all that soup, it does not and is incapable of evaluating whether its response is correct. It will lie brazenly to you just because that’s a high-scoring response. If you call it out, it will mollify you because that’s a high-scoring response.

          Your asking it about itself is a fool’s errand. Use real resources instead.

        2. Except humans are often likely to be wrong too. Are you learning from your mistakes if they aren’t mistakes and the person correcting you is wrong? How much incorrect information were you taught in schools?

          AI is not a meat bag bumbling around whose illusion of general intelligence helps it survive as humans are, and so our bias as to what intelligence is needs to be examined carefully when we’re developing told intended to be intelligent.

  2. That is honestly the Holy Grail of AI Research.

    Sadly while we have made a convincing “intelligence” via a massive model that can brute-force the english language and a whole framework surrounding it to make it seem like it is a well-behaved bot. Truthfully we are no closer to an intelligence capable of self-attending based on ethics and values, than we were two decades ago. :/

    Do not mistake a breakthrough in a single aspect, as signalling advancement of AI research as a whole. The field is rather divided into these small slices that each advance independently from each other. E.g. while Generators spiked, our regular Agents we run in robotics/game-environments are still the exact same.

  3. I tried to get results from today’s AI 5 times and its answers always were factually wrong. In today’s alternate facts era, most users sure won’t care and will swallow everything.

    Please prove me wrong…

    1. Clinical decisions should never be made by AI or any other algorithm. Instead, the AI could process medical images (MRI, X-ray, etc.), and mark areas where the specialist should look more carefully.

      The current form of AI is very good at convincing people it’s not complete trash, but it fails horribly when asked something non-trivial by an expert in a given field.

      ChatGPT doesn’t really understand anything, it can only mash public sources together, to give you an answer you could have found with a traditional search engine, but such that you will never find what the original source was.

  4. Current gen mainstream AI is all reliant on supercomputers and data centers. Actual non garbage AI output will come about when the machine learning is deployable on user end devices. This way a smart watch can actively train on the user’s real world interactions and then coherent useful outputs can be generated. There is value in this approach as it can be best utilized in improving and assisting in daily routines, and would have widespread use in medical applications.

    Dall E and ChatGPT, however, are these big ugly amalgamations of the contemporary internetscape, prejudice and racism still there, biases, hallucinations, inaccuracies. We fear modern AI because it excels at our lowest most base qualities. It is not easy to see our reflection so clearly.

    1. >We fear modern AI because it excels at our lowest most base qualities.
      I don’t fear it, because I actually enjoy human nature. The world isn’t a textbook, it’s okay to be wrong and say wrong things.

  5. I enjoyed reading all the comments. I agree with most of them… Is there any real advance in AI recently, sure our computing power is now enough to run these behemot of a algorithm. It’s still just a fancy flip switchs board.
    Advances come mostly from applied technologies from the middle of the past century… neural network, memsistor and so on. Without a body that can feel the world it’s in and act upon it, there won’t be any AI agent. Does our device feel tired when their batteries are low? Does they seek to get charged? Same old Mind-Body Problem.

  6. let’s see. Short of a, functional, summation I get from things?
    Basically we take the latest & greatest super powered search engine (for parsing overall data bases) and then trawl it through all of our “social” media (for daily language usage morphology & societal norms ) and then call it an A.I.?

    Euphemisms and colloquialisms, trendy new phrases, etc.
    The trolls will still be able to play hell on it through that tired “language is ever evolving” catchall/hole that -already- keeps languages mired in misunderstandings.

    Bah. I’m too busy hunting for my latest “universal” cord, to go look for that XKCD cartoon about standards.

  7. What happens when a significant percentage of training data has become watered down and factually incorrect, or corrupted for the purposes of disinformation and misdirection?

    It would be sensible that all AI generated content, is marked up with meta data, stating that it is AI generated and should not be used for further training purposes.

    When you start using garbage for input, you can only get even more garbage output. And for us human readers, a BS warning should be clearly visible.

    I recently read an article on a well respected, popular magazine web site, where not only had the article been entirely created by ChatGPT, but also lengthy replies to comments posted. The article and replies were misleading, factually incorrect and had AI generated propaganda that could only have been from the oil and gas industry.

    It reminds me of when farmers in UK, and elsewhere, in the interests of profit, began to feed their cattle on protein that had been contaminated with brain and spinal cord material from slaughtered cattle. The result was BSE, commonly known as “Mad Cow Disease”, which jumped species to humans, deer and cats.

    We should be careful what we wish for, and it doesn’t help that nut-job billionaires are actively telling world leaders that AI will replace all human employment.

    Let’s unpack this a bit:

    White Collar – possibly

    Blue Collar – hasn’t yet happened from automation revolution, unlikely to happen anytime soon from AI.

    Agricultural workers – maybe in the West for prairie cereal crops, but not for sub-Saharan subsistence farmers or third world nomadic sheep and goat herds.

    This is what happens when billionaires get stuck in their own little worlds, spout their often nonsensical viewpoints and get special access to the ears of politicians.

    Before we get over excited about Artificial Intelligence – we still have to do an awful lot more work on Human Stupidity.

    1. Can we put BS warnings on human Internet comments too? It’s not like you or I aren’t also running on the garbage in / garbage out principle, and yet it seems people only seem to use this as a problem with AI or point out this short coming as a problem for AI, rather than the far, far more general problem it is. For instance, your comment on it being AI generated propaganda sounds remarkably specious, as though it couldn’t be your own biases at play.

    2. @Monsonite, You’d think we would mark with Meta-data, but uhm… we don’t…

      Everybody was in such a rush to get their LLMs and Image generators out into the field before competitors and regulators could react. That there are no rules or standards be it Internal or from an external party. Some sites add a AI meta-data or can be recognized as AI generated by a disclaimer, but few actually do it and nothing stops bad actors from lying about AI having done the work for them. There are also myths like a robot “noAI” tag will cause scrapers to ignore the site in question. they don’t. It is genuinely lawless at present time.

      Result is that the Data-sets are kind of in danger of not being able to get updated anymore. As the risk of contamination of input for new by the output of old is increasingly likely as AI is abused more and more.

  8. From my experience using Writesonic AI tool, it has its merits, offering decent performance to generate content, but its drawback lies in its tendency to generate repetitive content when not prompted to explore deeper meanings. It shines with guidance and specific prompts, delivering impressive results. The accuracy though, the tool heavily relies on the user’s reference material. If the reference material is inaccurate, the generated information is likely to follow suit. Like to note, the sentence structure it sometimes finds a specific way to describe elements and uses that same flow over and over again. Hoping it gets better though but still very handy to have!

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.