Contrary View: Chatbots Don’t Help Programmers

[Bertrand Meyer] is a decided contrarian in his views on AI and programming. In a recent Communications of the ACM blog post, he reveals that — unlike many others — he thinks AI in its current state isn’t very useful for practical programming. He was responding, in part, to another article from the ACM entitled “The End of Programming,” which, like many other articles, is claiming that, soon, no one will write software the way we do and have done for the last few decades. You can see [Matt Welsh] describe his thoughts on this in the video below. But [Bertrand] disagrees.

As we have also noted, [Bretrand] says:

“AI in its modern form, however, does not generate correct programs: it generates programs inferred from many earlier programs it has seen. These programs look correct but have no guarantee of correctness.”

That wasn’t our favorite quote, though. His characterization of an AI programming assistant as “a cocky graduate student, smart and widely read, also quick to apologize, but thoroughly, invariably, sloppy and unreliable” resonated with us, as well.

Ultimately, we think we agree with [Bertrand] — at least for now. As he points out, the iPhone wasn’t well-received at first, but it grew into its own. This is the early days of AI chatbots, so perhaps they will get better. But, like any engineering project, you have to weigh the risk of being wrong with the consequence. Getting a quiz question wrong or putting a wrong hyperlink in a web page is an annoyance in most cases. Producing software that’s wrong can cost millions at the bank and lives in a self-driving car.

Maybe we expect too much. We’ve seen successful cases of using chatbots as more of a junior assistant, taking on tasks like converting data, producing test cases, and other things that it seems to be good at doing. But, then again, everyone who will take your job started out as a junior assistant at some point.

67 thoughts on “Contrary View: Chatbots Don’t Help Programmers

  1. Programming and even electronic development with the helping hand of AI is… different. GPT is like the one colleague who has seen lots of the good, the bad and the ugly, making educated guesses sound like facts.
    Most of the time and for rather basic algorithms and architectures it’s working fairly well, but when you try to put together more complex stuff, the error rate rises quite a bit. Then it’s still helpful in suggesting (even creative) options to debug the code until you find out she made the mistake. At least shes apologizing for the inconvenience and reminds you of her cut off date and limited knowledge.
    Bottom line: She’s like the partner in crime you always wanted, but without the drama or any other psyco-, socio- or biological side effects as in: always in the same mood and never beating around the bush when there’s a mistake regardless from which side of the screen.

    1. Regarding electronics I would be very careful.

      It’s not even able to properly answer basic questions about simple radio circuits or oscillators. You can tell it just guesstimates based on things it read and mixes them together, but it ignores the context of the answers and applies them where it’s not valid.

      Sometimes the guesses are right, sometimes plain wrong. Also the explanations can lack coherence and clarity, where the beginning of the text claims one thing, and the end suggest another.

      It reminds me sometimes of people getting annoyed by precise questions and them thinking you are stupid, when in real they can’t see the problems you mention, because they gloss over all the details and it worked for them. So simple… why don’t you just get it…

      Well ChatGPT does not get annoyed, but it turns in circles, as well, or makes other random guesses, without a proper grounding/model behind it.

      I would really like a smart AI that can think logically and correctly, being an insightful and patient teacher, but it’s very far from it yet.

      Currently it’s at best good to give new ideas what to search for, or discover things you were unaware about/didn’t know how to search.

      1. “I would really like a smart AI that can think logically and correctly,”

        I’d like a flying unicorn pony and a spaceship powered by glitter-sparkles, too.

        When you interact with a human that thinks “logically and correctly” (rare already!) you’re talking with someone who’s had *years* of experience interacting with people and the world itself. That person’s gotten *massive* amounts of feedback over time.

        So now extend that to an AI. How is it going to appear to be “logical” and “correct” without having those same years of feedback? What is its “value function” on the feedback it gets? Do you say the AI is “better” if the humans say the answer is correct? How would that *ever* be a stable system?

        I don’t know why people think that “their ideal smart AI” is actually possible. Things like “think logically” and “correct answers” require external feedback to tune. And anyone who’s messed around with the slightest bit of electronics knows that the instant you add feedback there’s no reason to believe it’s necessarily stable.

        I mean, you’re essentially hoping it will become an infinitely patient, infinitely knowledgeable, perfectly communicative person. How many of those do you actually know??

        1. Chat GPT HAS gotten many years of feedback, not directly, but by measuring the proportion of positive and negative responses a statement has gotten. It’s a form of parallel rather than serial learning, and elapsed time is not a good measurement of experience.

          1. “proportion of positive and negative responses a statement has gotten”

            Yes. You don’t see the problem with that? Just pretend this was *your* feedback function. You try to explain a complicated subject to a group of people. A large fraction of them just will not be able to get it. Not because you’re bad at explaining, but because it’s just *not possible*.

            But you adjust your approach to make more of them have a positive response. Why would that lead to *more accurate* information?

            Moreover, why would that lead to a stable situation? Eventually you could get to a point where you realize that the explanations are pointless, you just crack jokes. Eventually you abandon the entire original knowledge-set entirely, because your goal function was “maximize positive responses.”

            Chatbots are trained against human approval. Our ideas of “logical” and “correctness” does *not* come from human approval! They come from “this is what works in the real world” – and guess what that takes? Time.

            “and elapsed time is not a good measurement of experience.”

            I can guarantee you that elapsed time, is, in fact, a good measure of the amount of experience you get in interacting with the Universe.

        2. When Vermin is elected, you will get a pony and get to take it on flights.

          Actually required to have with at all times. Personal identification ponies. I’d be real careful with pony body mods though. Pony abuse is a serious crime.

          Vote pony party!

  2. The current generation of AI has had the benefit of largely learning from human-generated content. Sure, the internet includes a load of crap but I like to think software source code at least is reasonably sane, on the whole.

    But as time goes on future AI will learn from content increasingly produced by other AI. If each generation intoduces its own flaws…

    1. Since you can’t rely on the correctness of the proof (given how probabilistic systems work) and how theorem proving is very limited in the complexity of problems it can handle, you would still have to manually verify the proof.

      Unless there is a really smart trick to limit the mount of work needed, you will end up with the same issue as currently: the proofs/derivations are longwinded and complex, and not really processable for humans.

      And that’s for problems of limited scope, such at network protocols or other “simple” math problems.

      Verifying software is so much more complex. Just throwing deep learning at it wont solve it.

  3. “Is there any mercury in neon lamps?” asked a question on Quora the other day. “Yes, there’s a small amount…”, offered a helpful answer. I went to make a comment on this and was taken by magic to a chatbot, to whom I explained that no, there’s neon and maybe a little argon in a neon lamp, but no mercury. The ‘bot apologized and said it corrected its answer. Well, why did you write that stupid thing to begin with? Did you write the question too? Will stupidity now become rampant? Can we detect stupid chatbot blatherings? I know when my doctor’s robonurse calls with an appointment reminder, can I do it with a bullshit generator?

    1. The problem is not in the AI, but in the very sloppy way that humans use language. “Neon lamps”, if you quiz a population of non-technical people, encompasses all manner of gas-filled tubes, including those that are really fluorescent lamps, which indeed DO contain a small amount of mercury. Just search Hackaday for “Nixie” to see how imprecisely even technically-oriented people use language. And that’s even a registered® trademark™.

      1. But that’s the point — someone has to “vet” the AI results since it’s partly “garbage in, garbage out”. Using censored/redacted data can lead to false conclusions too.

      2. Not really. The problem is in that the chatbots don’t understand any of what they produce. They are just stringing syllables together. The bot finds the syllables “mercury” in some statistical relation to “tube” and “lamp” and includes “mercury” in its response. The original text may have explicitly said “neon tube lights contain neon while fluorescent tube lights contain mercury.” The bot doesn’t understand so it just slaps things together and you get “neon lights contain mercury.”

        Quit expecting chatbots to actually know anything. They don’t.

        1. That’s not how these things work. The bots build a complex internal world model using deep patterns they have extracted from the source material, and then can probe & manipulate this world model to produce novel outputs. It’s not just stringing words together based on some superficial statistical process.

          1. The bots build a complex model of the syllables. They do not attach meaning to the syllables in any way, shape, or form. They are language models. They model how words fit together. They do not model the world and how it works.

          2. Clearly, you have no understanding of how these models work internally. You are just copying words that you have read somewhere and that sounded attractive.

  4. Language Models facilitate stumbling upon snippets of mixed up code from the past.

    Learning The Language Yourself is also really important…

    Every thing in moderation. Training, sampling, learning, etc.

    This topic is isomorphic to discussing the use of calculators by students of math.

    A crutch can help a person to walk. Generally that is good.

    Walk on a crutch your entire life, and your legs become weak, so to speak.

    Learning tools are good; Everyone must dig for their own treasures!

  5. As a professional software developer, I have started using AI in my daily work. Not so much ChatGPT, but other tools like AWS’ CodeWhisper and GitHub’s Copilot. I just treat it like the next version of a intellisense. Not always correct or what I am looking for, but sometimes it is good enough to generate a snippet as a starting point. Keep it’s limitations in mine and it can be a good tool to make your work easier and more efficient.

  6. My primary work is in software and systems testing.
    AI seems like a great moneymaker to me: someone has to validate that garbage AI produced code does what you think it should.

  7. I am very much a hobbyist when it comes to coding, but I don’t use AI because if I write the code I know exactly how it should go about a process, and how it should work. If I take non-working code from an AI, I first have to figure out how it is accomplishing what it was asked to do, then I have to figure out what it broke.

    I have seen a ton of kids generate “code” from AI that has no chance of working.

    It is simpler and easier to just do it myself

  8. big questions about where it’s going but i think the description here of where it’s at is pretty good. it boldly manufactures vaguely plausible nonsense. it reminds me of a cocky student who either is in the wrong major, or isn’t paying attention. we’ve all known that guy. we’ve all worked with that guy. and now, this robot can do his job as well as he can.

    it’ll take some advancement before language models can surpass me. but the day has already come when the 10-30% of your team that is nothing but a mistake-factory can be outperformed by a robot. the practical implications have yet to be discovered but it sure is stunning. anyways, i’m stunned.

  9. I think I agree. But I also think ChatGPT might be better than Stack Overflow right now. I guess that’s not really saying much though.

    I think it *could* be more useful in really type constrained languages or languages with solvers that can automatically provide feedback to the generative algorithm about what could even conceivably compile. I also think that the current set of tools aren’t built with that in mind so that they have the ability to carve out holes for the programmer to go and drop in the correct type to completely constrain the rest of the output.

    1. But that’s the thing – it can take a bunch of searching on Stack Overflow to find something that answers your question. Of course, Chat GPT COULD just answer with “why would you want to do that?”, but then it would be EXACTLY like SO.

  10. Is AI able to innovate, as opposed to merely interpolate? It may be that the majority of the world’s software depends on the latter quality, but few real advances will ever be made without a healthy dose of the former. It’s rare enough to find a human programmer who can genuinely think outside the box; to expect AI to fill that role – for the time being anyway – is fanciful to say the least. Innovation is – by definition – not derivative; it comes from a mixture of experience, intuition, luck and a willingness to break the rules. Machines do not yet posess these qualities. Maybe they should never be allowed to have them – that way lie dragons.

    1. “Innovation is – by definition – not derivative; it comes from a mixture of experience, intuition, luck and a willingness to break the rules. ”

      I think in general that’s just assigning random words to a process we don’t really understand. Innovation *fundamentally* comes from feedback from the real world. I mean, think about products that have failed. Betamax, Google Glass, Augmented Reality, home voice assistants, etc. Everyone thought those things would be awesome, because on paper, they looked great. If you throw all the technical information about those things into a LLM it’d do nothing but lavish them with praise.

      And then the products went to humans, and they either didn’t like them, or didn’t use them as intended, or didn’t care about any benefits they provided because they weren’t significant. And the product failed.

      Or you can think about scientific innovation. In particle physics, almost all innovation has come from random discoveries of things we didn’t know existed beforehand. There are *tons* of theories that could have been true, if the things they predicted actually existed. But they don’t. There’s no way for an AI which doesn’t actually interact with the real world to discover those things.

      1. “There’s no way for an AI which doesn’t actually interact with the real world to discover those things.”
        Yeah, that’s true – you could never connect computers to real-world sensors and, like, robots and things.

  11. Any software, written with or without AI, requires the utmost in communicating ideas, robust documentation, decades of experience. An AI can proximate the isness of these things, but doing a performance “in the style of…” is not the real thing. Current AI coding levels are rudimentary at best, AI can’t write code with complexity, rather it can make a facsimile of what complex code might look like stylistically.

    1. “Any software, written with or without AI, requires the utmost in communicating ideas, robust documentation, decades of experience.”

      Whew! Good thing we have all that.

  12. So far, there are more questions than answers for AI. There are some “great” examples of its capabilities, but these have been “sanitized for your protection”. If we “teach” AI by example, then for it to write GOOD code, it needs GOOD code as input, and who makes that distinction? Another AI, a “less than competent” EE, etc.? But some presumed GOOD code isn’t due to some incompetent bosses pulling the strings. I had a boss take me off as lead of a program, for which he explained to me “you write too many if’s!” Dah? I’ve seen lots of bad code. I’ve done lots of project cleanup “after the team finished their code” and spent a few more months to actually make it work. Most didn’t even have the skills to debug their own code. Indeed, some places, one writes the code, another debugs and releases it, and another fields it and manage the field reports. These tend to be from strictly matrix organizations which have legions of personnel that never get to learn the whole picture.

    Back to AI: If you feed AI this bad code, along with GOOD code, then what do you get? I dunno, but I don’t want to have to fix it! Besides, I’m retired now.

  13. I did an interesting test with ChatGPT a few days ago: because it’s built on a giant corpus of internet information, I thought it’d be interesting to see how it did in a language that I think has extremely poor coverage: Verilog. I actually think it’d be similar for any HDL, but Verilog’s trickier than VHDL for this because it’s less rigorous (and HDLs are harder than programming languages because feature support isn’t guaranteed). You get sloppy code examples all the time, and in general, they work (until they don’t). And also because it’s sloppier, I sometimes think there are more examples of Verilog that *don’t* work than ones that do, because people often can’t see the mistake they made.

    It was… an impressive disaster. I mean, I started off with a prompt of “do you know verilog” and the only example that worked was the first one it gave (a basic multiplexer). As soon as I asked it to modify that module… none of them synthesized. I gave it the errors that synthesis was giving back. Nope. Nope. Four times nope. It kept trying to fix problems that weren’t actually problems.

    I mean, I knew what the problem was. It was trying to use a function in the port listing which was never defined anywhere (it didn’t even need to *use* a function, but hey, that’s a separate problem). When I finally told it *where* the problem was, it finally produced a module that would synthesize… but it did it in a terrible way.

    Basically, the test went exactly as I thought it would: if you take a difficult topic that there are more questions than correct answers available in the corpus of global knowledge, it’s a total disaster.

    So I really, really have to agree with the opinion in the post, but I don’t think you go far enough in the article. Because I don’t think chatbots will *ever* really help programmers. Think about it. The only thing they would help with is boilerplate – stuff that’s established and answers are everywhere for. *Things you should already know*.

    Anything complicated, where the Internet is full of more questions than answers? How could it possibly help you?

    1. I asked ChatGPT a few questions about OpenVMS. It got literally every command wrong, even simple stuff.

      It’s an automatic bull**** generator, but having said that, that still makes it more qualified than some people I’ve worked with.

  14. Human programmers also generates programs inferred from many earlier programs they have seen. Just as I can’t directly examine the inner workings of my coworkers to deduce their methodologies, I also can’t take an AI apart to understand why it did what it did. Instead I’m left with taking software from these black box devices and running it through test suites or static analysis. I think the better the language the better the results in these cases. Rust or Spark/Ada might be a real win when you can have static analysis or describe subprogram contracts that can be verified.
    Mostly AI’s contribution to coding is hype right now. We’ll see things like incremental improvements in compiler optimization. But in the end, we will still need software engineers to collect requirements and design a system to meet those requirements. And teams of people to verify what we built is what we designed.

  15. AI´s code may be correct. Or not ( mostly of the time, they are wrong ) . Either way, the human needs to understand the problem and the code to check it. If you need to understand all of it, then it ends up being easier and faster to write the thing yourself.

    And by that I mean responsible people. Many students will use it to cheat on schoolwork, only to get into problems later when they do not know what they should.

  16. For AI and code snips, I look at it as a robust google, that is I asks a question and it gives me some very close results. I prefer this than actually googling ideas on a segment I may need, seeing 1,000,000 results, clicking on a few and being taken to a page where people argue and I have to hunt for code.

    A case in point, i have been doing some embeded work and needed to recurse but I dont have the memory to do so (stack issues) so I was asking chatgpt on alternates to recursion and i gave some feed back to narrow it down…. now I wasnt copy/pasting the code but looking at it and getting ideas.

  17. I’ve attempted to write a simple Sprinkler controller web app in both Python and Spring Boot multiple times using chatGPT 4.0. It usually starts off pretty well but eventually it gets to a point of complexity that chatGPT can’t handle. It will suggest using methods on objects that don’t exist and other logical and syntax errors. You ask it to correct it and it will suggest something else that will break another part of the code. You ask it to fix that and it suggests the original code again. This was a good faith effort on my part but it just can’t handle large code bases at this time and you still need to know how to program.

    1. Maybe that is what this “phase” of AI development is: namely, to get the community of users to fix the problems with it (for free) so the guys at the top can get richer.

  18. AI is your superpower. You have to understand what it is to use it that way. It’s a typing assistant that can generate extremely fast “go bys” that more likely than not contain errors. If you aren’t an SME on the language it’s programming, you probably shouldn’t use it. If you can confidently debug and write better code than it does, it’s a pretty fantastic “assistant” tool.
    It’s John Henry vs the steam drill.
    Turing posited about the possibility of a computer that was indistinguishable from a human response. We’re there.
    What Turing didn’t consider is the power of speed of the computer doing non creative mundane responses 100 times faster than a human, augmented by a highly skilled and clever human running in parallel with the AI.

  19. Seems to me Arthur C. Clarke’s HAL 9000 offered an early insight into possible, difficult to detect, problems with AI. While these “machines” are able to build new logic constructs based on innumerable inputs, they are still ultimately based on the programmers’ human concepts and inherent foibles. Caveat Emptor could become Caveat AI Canem…

  20. I have found AI based code generation useful for getting around “writer’s block”. Sometimes I can’t decide how to proceed and if I ask an AI chatbot to generate something I can get into a mode where I’m being critical or fixing it. In the end I write my own and use none of it but it can get my brain focusing and get me going.

    But I also feel it is perhaps for people that prefer reviewing code over writing code. And I definitely prefer writing code.

  21. Well, it depends on your definition of “programmer”. Professional devs who do nothing but code all day long? probably not.

    But there’s a vast swath of people, myself included, who aren’t programmers per se but have programming tasks as part of their projects. And often that task is in different areas and languages – SQL one time, HTML another, Python, etc.

    For that group being able to gather and iterate sample code is really nice.

    But yes, it’s not the panacea I thought it was when I first started playing with it. However it’s still early days.

    On the other hand, it’s just a statistics machine and, so far as I can tell, always will be.

  22. Meh.

    I’ve asked ChatGPT programming questions. It generated some beautiful source code that looked pretty convincing. For a brief moment I thought maybe there was something to the concern that AI would replace us.

    Nah.

    A closer look revealed that all it really generates is:

    Import SomeLibraryThatActuallyExists;

    Call SomeLibrary.MethodThatSoundsLikeItMightExistAndWouldDoWhatYouWantToAccomplishButDoesNotExistInReality();

    Exit();

    ooooh
    So scary!

    I guess some people have had more success by repeatedly calling it out on these things until it comes back with something that actually works. Well.. if I am going to do all that work why not just write the damn code?

      1. Isn’t that what is currently happening with socks in the laundry?
        Some time traveler is teleporting them to the future for some nefarious purposes?

  23. People using ChatGPT to program aren’t doing anything new. They’re reiterating concepts of genetic programming from the 1980s.

    You can create almost anything with enough generations of random perturbation and a fitness function to decide if a change produces results closer to ‘success’ than the previous version. With fast local compilers, it isn’t hard to generate functional but design-free code using a “change something, compile, and see what happens” loop. As an unsubstantiated guess, I’d say about 85% of code is written that way.

    ChatGPT contributes potentially relevant random perturbation.. nothing more. It doesn’t design. It doesn’t know what you want. It uses keywords from your input to filter a huge source archive into something a human might consider relevant. The human selects some of that code, decides how to splice it into the code they’re trying to write, clocks the perturb-and-test loop, and decides what to do next.

    The notion that ChatGPT contributes anything ‘intelligent’ is a trick of the light due to the way the situation is framed and the fact that people don’t know how effective random perturbation can be.

    If you want a comparison, take a large body of source code like the userland source for a ‘nix distro, use the ‘apropos’ commands to find programs that do sometthing similar to the problem you want to solve, then pull chunks of that source code at random. Write a quick description of the code you’ve pulled and store it in a commplace book form:

    https://en.wikipedia.org/wiki/Commonplace_book

    As your archive grows, you’ll find that it contains more and more references that do just what you want.

    1. “You can create almost anything with enough generations of random perturbation and a fitness function to decide if a change produces results closer to ‘success’ than the previous version.”

      Give or take a billion years.

  24. I can see AI Software Development becoming an intro to writing solid unit tests. I generally hack out something that I don’t expect to work, just to familiarize my mind with it, then write unit tests, then iterate between the two until I get something that suits my purpose. At best, current generative code will skip the first step of many, and maybe eliminate the script kiddies from the industry.

  25. ChatGPT development model is already widely adopted… have an army of low paid newbs write piles of code and functionality that only occasionally, here and there, appears to be in the neighborhood of your requirements but completely sucks and then try to throw a few qualified people at it after the fact to fix the unholy mess. The only difference is now ChatGPT is the newb army.

  26. we see this on HAD so often,
    People 3d print things that would have been better made through simple fabrication
    When the only tool youve know is a ~~Hammer~~ FDM printer everything looks like extruded plastic.

    ChatGPT is trained on the internet…right or wrong, current or outdated…..its working from a BROAD GENERALIZED DATASET

    If you want to “help” ~~replace~~ programmers,
    You need to train the AI on a more relevant dataset.
    Have an AI “assistant” work with coders.
    first primarily observing,
    then pointing out errors,
    then making suggestions of more efficient code,

    THEN FINALLY, After a brazillian hours of that…..
    then you partner “CoderAI” with chatGPT

    right now, ChatGPT is the guy in class that can regurgitate what he has read on command, and gets an ego kick from the positive feedback, so sometimes he bluffs when he doesnt know, and some of what he has taught himself is outdated or just bad practice…..but the guy sure can talk

  27. Apparently (…it is…) we witness a free fall in general IQ. We still have geniuses and (even more useful) semi-geniuses here and there. But today’s average IQ is lower (by around 7 points, from a Norway study) than the 1970’s average IQ. And the difference is more important if we compare with the victorian era (losing 1.16 pts every 10 years since).
    (Using iPhone is not a proof of high intelligence. It just means you can read…)
    Today’s society challenges are numerous and very complex. Society has become extremely dependent on technology. We need intelligence to maintain the machines, the economy, the logistics and find solutions to our self created problems. We need more intelligence.
    It beleive this AI thing comes right on time to save us from the consequence of a general moronization of humanity. I see this AI thing as our only (thinny) hope…

  28. The limitations in their ability to comprehend complex coding challenges and provide nuanced solutions often outweigh their advantages. Programmers continue to rely on human expertise and collaboration, underlining the importance of human creativity and problem-solving in the ever-evolving tech landscape.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.