How To Use LLMs For Programming Tasks

[Simon Willison] has put together a list of how, exactly, one goes about using a large language models (LLM) to help write code. If you have wondered just what the workflow and techniques look like, give it a read. It’s full of examples, strategies, and useful tips for effectively using AI assistants like ChatGPT, Claude, and others to do useful programming work.

It’s a very practical document, with [Simon] emphasizing realistic expectations and the importance of managing context (both in terms of giving the LLM direction, as well as the model’s context in terms of being mindful of how much the LLM can fit in its ‘head’ at once.) It is useful to picture an LLM as a capable and obedient but over-confident programming intern or assistant, albeit one that never gets bored or annoyed. Useful work can be done, but testing is crucial and human oversight simply cannot be automated away.

Even if one has no interest in using LLMs to help in writing production code, there’s still a lot of useful work they can do to speed up the process of software development in general, especially when learning. They can help research options, interactively explore unfamiliar codebases, or prototype ideas quickly. [Simon] provides useful strategies for all these, and more.

If you have wondered how exactly glorified chatbots can meaningfully help with software development, [Simon]’s writeup hopefully gives you some new ideas. And if this is is all leaving you curious about how exactly LLMs work, in the time it takes to enjoy a warm coffee you can learn how they do what they do, no math required.

59 thoughts on “How To Use LLMs For Programming Tasks

  1. Beware, the problem might be the following:
    These “chatbots” do take information from reliable human sources, such as coding forums.
    But if less and less human coders remain, the quality of the imformation sources will degrade over time.
    Eventually, results from “chatbots” will be taken as source of information by other “chatbots”.
    And the wonderful, magic dream of AIs doing programmer’s work will eventually go “poof”.
    Humanity doesn’t seem to realize that LLMs are a mirror of society, sort of.
    They aren’t creative or deep thinking, but more like highly advanced filter stages.

      1. To be fair, that’s also true of humanity and society as well as the internet in general, the enshittification of everything is in no small part due to self reinforcing bubbles feeding on their own output.

    1. corollary to this that’s applicable right now: LLMs work way better with javascript and python than it does with, say, embedded C or verilog. brand new programming language? brand new embedded device? the LLMs don’t know anything, and won’t for a couple of years, at least.

      1. Yep. Consistent API surface (e.g., mostly browsers and node)? Lots of boilerplate to train on. Mess of ad hoc dependencies, custom make scripts, bespoke includes, twenty ways to do the same stuff? Less consistent training data; trust less, as you may get a mix of stuff.

        Not saying JS/python is better than C/C++, but they are way easier to reason around.

    2. Not true. The human uses the LLM to help write some code. Human tests the code and corrects it. Human uploads the corrected code to Github or similar. LLM revises it’s model from Github.

      1. “The human uses the LLM to help write some code.”

        Then it’s a different outcome, sure.

        What I meant to address was the idea that LLMs replace human coders,
        because everyone things that LKMs make them superflous.

        This idea is at least a fad right now:
        Many laymen and employers may think that an LLM does the code writing for them,
        so hiring developers or learning to program seems unnecessary to them.
        – Just like news agencies and magazines may think that articles can be easily written by LLMs, so they can fire their employees.

        However, if this thinking becomes more popular, the coding forums might loose their user base.
        And with them, the LLMs might run out of good source information over time, too.

        That’s what I meant to say, mainly.
        LLMs can be useful, but they may need “fresh” input by real living beings from time to time.

        1. because everyone things that LKMs make them superflous. This idea is at least a fad right now

          Some of Corporate ‘Merica seems to think that. I see many (not most) companies going the route of replacing their ‘coders’ (Architects, Systems Engineers and Software Engineers). Those companies are in for a huge surprise. AIs don’t think, it’s a mathematical formula, statistics, just really fast. Useful as a tool, but not as a replacement.

          It is my understanding that some of these AI models are experiencing some form of dementia (too much internet news ;-) ).

      2. “Human tests the code and corrects it.”
        Corrects it until it barely works, yes. Corrects it until it’s good and suitable to form the basis of more code? Probably not. I’m not a programmer though – maybe someone could step in here and explain the difference between “code that works” and “code that’s worth training an LLM on”.

    3. I like that a lot. LLMs are a reflection of society at this moment. I’ve had shower thoughts about this.
      As they grow and get smarter, they hunger for information which can only be satisfied by feeding them their own data back at them. Intentionally or unintentionally.
      This will probably stagnate them to the point of irrelevance at some point. Hopefully something better will come along by then to take their place.

    4. it took me a while to accept that this seems to be 100% true, that there is nothing even remotely adjacent to original thinking in them. they can, to some extent, mash together 2 ideas they’ve seen before but it’s just an ephemeral thing…they invented that third idea for the purpose of this conversation but it doesn’t compound, they can’t then invent a fourth or fifth idea. they don’t ‘understand’ it and it’s easy to observe that for yourself.

      but the neat implication…the reason i fell for it at first is i asked it to reinvent something i had invented myself. i figured, if i made something novel in the world, and it can do so through just the briefest description, then it must have an impressive extrapolative ability! and it about 80% succeeded, with some bugs that made me wonder if it understood the concept. and after a bit of talking with it, it could be convinced to fix the bugs. but it became clear it didn’t understand…then how did it invent something new???

      it didn’t! it was telling me that someone else had invented it too and even though they didn’t write it up in ACM SIGACT, they had put the code itself on github or something where our novel idea sat around as just a footnote to an abandoned niche project, until openai scraped it and added its little tricks to its global knowledgebase.

      and that is pretty cool i think. not as directly useful as we might hope but if chatgpt can answer your questions then it’s a strong indication that there’s prior art there waiting to be discovered

      1. It doesn’t really matter how unoriginal something like code writing is — it’s a means to an end. The real source of originality when using an LLM is in the prompt. I love how cheerful ChatGPT is when I ask it to do something most people would consider “weird” (someone on StackOverflow would surely demand “why would you ever want to do that??!!?”) — it does what you ask.

        For example, I needed an overlay on a block of HTML text in an entirely separate semi-transparent layer related to the letters underneath. I had no idea how this would be done, so I explained in detail what I needed, and ChatGPT turned me on to a whole part of Javascript that I’d never heard of. That is the power of LLMs.

  2. The concerns about chatbot-produced code fail to take into consideration this reality: all human software developers occasionally write bad code. By this I mean code with errors or code that goes about doing what it does in a sub-optimal manner. This is why we test our code, or, better still, have others test our code. If all a developer did was submit code written by a chatbot, this code (assuming the developer described their needs in effective prompts) would be pretty good most of the time, and, better still, it would be produced quickly. But even if it were terrible, testing would filter out the bad stuff. So, because of these selective pressures, concerns about chatbots producing bad code is a little like worrying that creatures from a given environment might not all be perfect for that environment. Darwinian forces fix both problems, and we can look back on all the code produced in wonder, forgetting the survivor bias between our present and the past. Chatgpt doesn’t have to be right 100% of the time to be extremely useful. Anyone who has tried the alternatives (StackOverflow) can see the value in an extremely knowledgeable, cheerful, and, above-all fast, assistant.

    1. AI is best treated as a better Google search. Use it to find answers, if you can, but don’t trust it to write huge blocks of code, or even small ones, by itself.
      It can be a better documentation search, but it will also waste your time by making up answers that are incorrect and presenting them with absolute certainly.

    2. haha it’s 2025 and you still believe it matters if code works. i love your faith in humanity! i get demoralized by the millions of lines of buggy javascript that it takes to display a typical commercial website these days. i find it hard to hold that perspective when even media companies that spend billions of dollars on content are able to charge people $20/mo to view their content through an infuriatingly-buggy app.

      but after all, i think you’re largely right. the marginal reward for working code isn’t as all-powerful as i want it to be, but it surely still exists. things that just don’t work eventually experience user-retention problems that do pressure them to make adjustments, even if the response mechanism is all-but-obscured through a bunch of financial meddling.

      the thing is, you’re assuming the testing still operates. it saved us from development but now we’re still bearing the burden of testing. that’s in the optimistic world. obviously a lot of groups will continue to forego testing.

      i am actually not impressed with a machine that generates code easily and then leaves debugging as the only arduous task, because that’s already how i work! i can generate 1000 lines of code in a day, but it may take me months to properly debug it, depending. upgrading to generating 1000000 lines of code in a day doesn’t address the bottleneck in my process. amdahl’s law.

      there’s an awful lot of uninteresting code that is written that doesn’t take much debugging because it doesn’t do anything at all subtle. but vastly increasing the quantity of that kind of code just buries the deep problems deeper. i’d be most interested in an AI tool to reduce the quantity of code through refactoring but unfortunately the difficulty when refactoring is a lack of thoroughness and AI that i’ve met is even less meticulous than i am! it’s makes mistakes too much like human mistakes, it doesn’t have the magic.

  3. It sure seems to me a ‘lazy’ way to code. I’d rather be creative (use my own brain) to come with a solution rather than have something else do it for me. Seems like you’d lose your ‘edge’, get ‘lazy’, get ‘sloppy’ about it and be mashing things together to get a maybe working system. No thanks. Nothing wrong with doing searches with standard search engines though for examples if you are really stuck (just like you do today). LLMs seem to be the ‘new math’ of computing coming our way :rolleyes: to our coding discipline.

    1. For a 10x coder this might be the case. Anyone who can write code line by line probably doesn’t need the diluted code that an llm offers. But us regular hunans who forget if it’s += or =+ might need an llm from time to time. =P
      I’ll be honest, I’ve found answers (specifically with error messages) faster with chatgpt than trying to dig through the endless stackoverflow pages. It’s a tool, just like any other. Used within reason can help.

    2. The whole purpose of writing code is to create a tool so I can increase my laziness… It is about increased efficiency. Willfully ignoring an efficiency tool goes against what our coding forefathers, well, not really fought for but uhh.. had headaches and lost sleep and hair for?

      LLMs are a productivity tool. They help make things more efficient. Maybe I will lose my edge a bit in that I won’t have to constantly waste space in my brain remembering if that function is needle, haystack or haystack, needle. But the fact that for simpler, more tedious, tasks I only have to write a function name and 70% of the time copilot gets the rest of the function spot on in a second without the tedium of me having to write my 7258th CRUD function, yeah, I’ll take that absolutely any day my edge be damned. The app is still mine, the code is still mine, nothing is happening without me putting it all together. The creativity is still there. The tools simply enable me to do the exact same work only faster and without constant references to BS that doesn’t deserve real estate in my brain.

      1. This is my experience as well, and as a self-taught programmer, chatgpt is a kinder and more helpful mentor than any human I’ve met, including friends and family.

        I’ve got quite a lot of stuff floating around in my brain and as I get older I can recognize things that I used to know but now are too fuzzy to be used without a reference. Ai fills that gap beautifully.

        We built ai to mimic human intelligence, but there’s a big elephant in the room and most people are standing in the corner boasting about how they’ve never seen an elephant and they never will. The thing about the elephant in this particular room is that human Intelligence isn’t perfect and people always make stuff up and state it with way too much confidence. How is this so shocking when ai does it? We modeled it after us!

    3. heh i have the same feeling but it reminds me of a case where i resorted to blindly mashing code together. just an anecdote from the trenches

      i had a conditional where there were like 10 mode flags going into it, including one where it was pass==1 vs pass==2. and the other mode flags all had a bunch of related but untouchable complicated logic that decided whether to do a fraction of the work in pass 1 or in pass 2. and i had known it was wrong for 5 years already, but its effect added redundancy that you only notice if you look for it.

      and one day i just sat down and made the exhaustive test case for all the reasonable permutations of the modes, and then i simply inserted and removed conditionals one at a time until it passed the test case. no attempt at all to understand why, just one at a time, found a mismatch in the test results and fixed it in a blunt way without regard for the whole of it. at first it was one step forward for two steps backwards but after only about a dozen iterations, it converged and i haven’t touched it or thought about it since. that was 2006

  4. Having experimented with these large language models a bit over the last year or so, they are good at some things, but awful at others, the issue being, they are stupid and like all stupidity, they are always absolutely sure they are correct. They will give you an answer most of the time, but you can never trust them to give you the correct answer. Precise wording can sometimes mitigate some of the inaccuracies but not all of them. I’m not sure what I fear more, the day these AI models get it right or the day we stop questioning whether they are getting it wrong.

  5. Or, how about not blatantly stealing other people’s work?

    There are zero, none, nada, ziltch, no useful models in existence that aren’t full to the gills with “non consensual” training data.

    Anything they produce is poisoned, and can’t be used legally or ethicality, except for personal entertainment. And even that stretches the ethics.

    I don’t care how far down the rabbit hole we are.
    I don’t care how much work would need to be done to do it “the right way”.

    We have to stop acting like this is okay, just because it is a clever little toy, and some executive wants to keep justifying their quarterly bonus.

    LLMs and generative “AI” are theft, and a violation of consent.

      1. Utter nonsense, I buy books, pay to watch television, movies, theatre performances etc. etc. the data is licensed to me.

        I am not allowed to post that data as my own or create derivative works from it without prior consent unless the license explicitly allows it.

        I also consume free and open source data, for which there is almost always a license which may or may not permit me to modify and reuse it, with or without credit to the original creator.

        Said licenses may also explicitly forbid the use for AI/LLM training.

        It’s not a grey area, LLMs and AIs trained on data which wasn’t licensed to them are stealing and violating consent.

        1. This feels like cope.
          You have so many other sources that you can get data from that is free to see and sometimes unintentionally so that can still be influential to you. Any music you’ve heard someone else playing on a device might as well be stolen by this logic and further anything you create after hearing it might as well be stolen content.

          This also would only be a defense for big companies getting you to pay to view something and also the people who worry about AI being giant theft machines are usually concerned about smaller artists and people who might share their work for viewing but not for anything else, this line of thought does nothing to protect them because viewing creates inspiration that will influence the people creating new works.

          1. nah it’s jealousy

            openai et al really have scraped an enormous quantity, much more than any individual

            and they really have scraped a lot of databases that it would have been considered illegal for me to scrape. they engaged in massive, willful, and effective piracy to train it.

            i don’t have a moral objection to either of those two things but it’s a totally different feat than all i’ve learned in my life that i can copy from, and the copying it enables is totally different as well

      2. The difference is that an LLM can record much more completely, and more quickly, than the human brain. We retain concepts and ideas and general patterns, not complete works. And we have far less exposure to other people’s data, simply because it takes a long time to read and understand.

        LLM’s will happily replicate other peoples work verbatim, with no regard for Copyright, or licencing restrictions. We can’t do that – at least not in high volume.

    1. I don’t particularly like generative AI either, but the publishing industry is taking advantage of this moment to try to massively expand the concept of intellectual property, and we have to fight tooth and nail against it.

      First, and this was totally uncontroversial 10 years ago, none of this has anything to do with “theft”. Copyright infringement is a civil matter. Is isn’t “stealing” in any way, not even metaphorically. Intellectual property rights are already too strong and they must not be strengthened any further.

      Second, being inspired by something is very explicitly not copyright infringement. A piece of writing which contains no copyrighted material is not copyright infringement. Writing or drawing in someone else’s style is not copyright infringement.

      It’s possible that the weights of an LLM itself could be considered derivative works. Their output is absolutely not.

    1. totally spitballing here. hackaday is actually pretty solid for me and their javascript seems relatively unsophisticated. not much room for truly unpredictable behavior

      but i used to have a bunch of random faults like this on different websites and i eventually figured out it’s because my fiber modem was randomly killing DNS requests, up to 50% of them when it was really upset. and since most websites have enormous cross site scripting vulnerabilities (i will never use that phrase without the word at the end), it meant a random subset of the javascript wasn’t loading. a different random subset each time. turns out rebooting the modem was a good fix.

      (that’s in addition to the regular faults from the websites working as designed)

      1. I read HaD using Desktop, smartphone, multiple browsers, multiple providers so I don’t think any modem is to blame.
        The reply links are always visible, this is only the “Leave a reply” that is missing from time to time. Like …right now (edge / dektop / corp backbone )

        Just checked :
        Div #wp-temp-form-div has a style “display:none;”. Why bother with masking it ? For old posts ?

      2. “relatively unsophisticated”

        We’ll take that as the compliment that it’s clearly intended to be! :)

        Seriously, though, we’ve had people experience problems with the Reply and also with comments not linking to the parent when responding. So far, it’s always been down to blocking JS on the client side.

    1. I expect more. I expect computer vision to be able to rev. engineer a whole PCB from photos and why not x-ray shots, and spit a netlist, (partial) BOM, and get all relevant datasheets.
      And AR vision overlaying pins and functions for live debugging of a PCB.

  6. Computers were once touted and relied upon to be precise and accurate in their processing of data. It’s more than a little ironic to me that people are now spending enormous amounts of money and energy to turn computers into consummate bull$hit generators.

    1. AI outputs bull$hit because people output bull$hit and then we feed that to AI. Ever try to find an answer to anything by asking more than one person? How many of those people seemed so certain of their answer, then another person came along and “well acshully”ed the previous person? Maybe we should train AI on facts instead of the consensus of the general population, which let’s be honest, the average person is an imbecile.

      1. Kinda, kinda not.

        LLMs put out random words one after the other, and if the probabilities that they’re generated from are reasonable, then you get “reasonable” stuff out most of the time. But not all of the time. Depending on what you think is reasonable.

        The issue of the quality of the training set still seems to be relevant, at least if you want to make smaller or more efficient models. But for the super-giant, throw everything at the wall and see what sticks models, you’d have to conclude that they work pretty well, IMO. So it can’t be that the training data is all that bad either.

  7. What I worry about is not the immediate effects of ai assisted coding in the industry. Let’s say companies start replacing junior devs with auto generated code. Yeah we will need some of them and some more senior devs and architects but generally we will have fewer developers coming up in the ranks. So 10 maybe 15 years go by and we won’t have any more senior devs or architects. They retired and no one learned how to actually code since there is no job market for it. That is how skynet gets us. It’s not a big bang of robots taking over in a quick war. It’s a long war of attrition where humans forget how to use their brains and one day we wake up to stupid to do anything but serve our overlord masters.

  8. LLM’s are proficient at single lines of code. However for larger or more complex scales of code it is less proficient than a novice programmer using google to learn a language from scratch. Anyone who has actually looked at AI generated code should be able to see that humans are far better at coding. Forget that humans make mistakes, this is an advantage since code is refactored and rewritten often by human led teams. AI code 90% of the time is based on old information for languages. Gemini spits out Python 2 syntax when its asked to write for Python 3, incorrect and fake function calls. Its worthless.

    1. For languages that don’t completely change themselves with an increase in version number (like Javascript and PHP, though not Python) and also languages with terrible documentation like Azure Pipelines, I find ChatGPT at least an order of magnitude more helpful than StackOverflow. I pretty much never visit StackOverflow anymore. You masochistic Luddites are welcome to continue using it though! A huge problem with Python is its insistence on using whitespace as a logical part of the language, which makes cutting and pasting from examples a generally-frustrating experience. That alone makes me HATE Python.

  9. I earned my living as a programmer for over 30 years. C/C++/C#/PowerBuilder/Python/dBase II and derivatives/x86 assembly/PL-SQL/Transact-SQL/etc. I have largely dismissed AI assist for experienced developers, but I recently revisited a couple of my .NET Android apps and the GitHub Copilot feature in VS2022 has changed my mind. There is a dearth of documentation and online example code for .NET android and it can be very difficult to translate native Android Java code to its C# equivalent but Copilot suggestions saved me hours of fruitless Googling. I still don’t think you should ask AI to build your app for you, but as a tool to augment your real-world experience… I’m sold.

    1. I had a similar experience using Copilot to learn a new UI library / app framework (Uno Platform) over the last several weekends. It’s been quite helpful, even for a relatively low-resource platform.

      The tricky thing is that the autocomplete makes a few bad suggestions for every good one. I’ve been at this long enough to just keep typing, barely breaking my stride, but I could see this leading new developers completely astray if they don’t have a healthy amount of skepticism.

  10. what power sources does ai plan to use in the furture to power its data centers?

    AI Overview.

    To power the future’s AI data centers, the industry is exploring a mix of power sources,
    including renewable energy (solar, wind, and hydro), natural gas with carbon capture, and even nuclear energy.

    “Exclusive: Andreessen Horowitz seeks to raise $20 billion megafund amid global
    interest in US AI startups.”

    “An Overwhelmingly Negative and Demoralizing Force (aftermath.site).”

    does ai overtake humans hope to power data centers with low-power low-cost risc computers?

    AI Overview.

    While the hope for powering data centers with low-power, low-cost RISC computers is a promising avenue for energy efficiency in the face of AI’s growing energy demands, it’s not a simple solution to the problem of AI’s energy consumption, as AI workloads are expected to continue to surge.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.