An Animated Walkthrough Of How Large Language Models Work

If you wonder how Large Language Models (LLMs) work and aren’t afraid of getting a bit technical, don’t miss [Brendan Bycroft]’s LLM Visualization. It is an interactively-animated step-by-step walk-through of a GPT large language model complete with animated and interactive 3D block diagram of everything going on under the hood. Check it out!

nano-gpt has only around 85,000 parameters, but the operating principles are all the same as for larger models.

The demonstration walks through a simple task and shows every step. The task is this: using the nano-gpt model, take a sequence of six letters and put them into alphabetical order.

A GPT model is a highly complex prediction engine, so the whole process begins with tokenizing the input (breaking up words and assigning numerical values to the chunks) and ends with choosing an appropriate output from a list of probabilities. There are of course many more steps in between, and different ways to adjust the model’s behavior. All of these are made quite clear by [Brendan]’s process breakdown.

We’ve previously covered how LLMs work, explained without math which eschews gritty technical details in favor of focusing on functionality, but it’s also nice to see an approach like this one, which embraces the technical elements of exactly what is going on.

We’ve also seen a much higher-level peek at how a modern AI model like Anthropic’s Claude works when it processes requests, extracting human-understandable concepts that illustrate what’s going on under the hood.

AI Face Anonymizer Masks Human Identity In Images

We’re all pretty familiar with AI’s ability to create realistic-looking images of people that don’t exist, but here’s an unusual implementation of using that technology for a different purpose: masking people’s identity without altering the substance of the image itself. The result is the photo’s content and “purpose” (for lack of a better term) of the image remains unchanged, while at the same time becoming impossible to identify the actual person in it. This invites some interesting privacy-related applications.

Originals on left, anonymized versions on the right. The substance of the images has not changed.

The paper for Face Anonymization Made Simple has all the details, but the method boils down to using diffusion models to take an input image, automatically pick out identity-related features, and alter them in a way that looks more or less natural. For this purpose, identity-related features essentially means key parts of a human face. Other elements of the photo (background, expression, pose, clothing) are left unchanged. As a concept it’s been explored before, but researchers show that this versatile method is both simpler and better-performing than others.

Diffusion models are the essence of AI image generators like Stable Diffusion. The fact that they can be run locally on personal hardware has opened the doors to all kinds of interesting experimentation, like this haunted mirror and other interactive experiments. Forget tweaking dull sliders like “brightness” and “contrast” for an image. How about altering the level of “moss”, “fire”, or “cookie” instead?

Here’s Code For That AI-Generated Minecraft Clone

A little while ago Oasis was showcased on social media, billing itself as the world’s first playable “AI video game” that responds to complex user input in real-time. Code is available on GitHub for a down-scaled local version if you’d like to take a look. There’s a bit more detail and background in the accompanying project write-up, which talks about both the potential as well as the numerous limitations.

We suspect the focus on supporting complex user input (such as mouse look and an item inventory) is what the creators feel distinguishes it meaningfully from AI-generated DOOM. The latter was a concept that demonstrated AI image generators could (kinda) function as real-time game engines.

Image generators are, in a sense, prediction machines. The idea is that by providing a trained model with a short history of what just happened plus the user’s input as context, it can generate a pretty usable prediction of what should happen next, and do it quickly enough to be interactive. Run that in a loop, and you get some pretty impressive clips to put on social media.

It is a neat idea, and we certainly applaud the creativity of bending an image generator to this kind of application, but we can’t help but really notice the limitations. Sit and stare at something, or walk through dark or repetitive areas, and the system loses its grip and things rapidly go in a downward spiral we can only describe as “dreamily broken”.

It may be more a demonstration of a concept than a properly functioning game, but it’s still a very clever way to leverage image generation technology. Although, if you’d prefer AI to keep the game itself untouched take a look at neural networks trained to use the DOOM level creator tools.

Assessing Developer Productivity When Using AI Coding Assistants

We have all seen the advertisements and glossy flyers for coding assistants like GitHub Copilot, which promised to use ‘AI’ to make you write code and complete programming tasks faster than ever, yet how much of that has worked out since Copilot’s introduction in 2021? According to a recent report by code analysis firm Uplevel there are no significant benefits, while GitHub Copilot also introduced 41% more bugs. Commentary from development teams suggests that while the coding assistant makes for faster writing of code, debugging or maintaining the code is often not realistic.

None of this should be a surprise, of course, as this mirrors what we already found when covering this topic back in 2021. With GitHub Copilot and kin being effectively Large Language Models (LLMs) that are trained on codebases, they are best considered to be massive autocomplete systems targeting code. Much like with autocomplete on e.g. a smartphone, the experience is often jarring and full of errors. Perhaps the most fair assessment of GitHub Copilot is that it can be helpful when writing repetitive, braindead code that requires very little understanding of the code to get right, while it’s bound to helpfully carry in a bundle of sticks and a dead rodent like an overly enthusiastic dog when all you wanted was for it to grab that spanner.

Until Copilot and kin develop actual intelligence, it would seem that software developer jobs are still perfectly safe from being taken over by our robotic overlords.

Hackaday Links Column Banner

Hackaday Links: October 13, 2024

So far, food for astronauts hasn’t exactly been haute cuisine. Freeze-dried cereal cubes, squeezable tubes filled with what amounts to baby food, and meals reconstituted with water from a fuel cell don’t seem like meals to write home about. And from the sound of research into turning asteroids into astronaut food, things aren’t going to get better with space food anytime soon. The work comes from Western University in Canada and proposes that carbonaceous asteroids like the recently explored Bennu be converted into edible biomass by bacteria. The exact bugs go unmentioned, but when fed simulated asteroid bits are said to produce a material similar in texture and appearance to a “caramel milkshake.” Having grown hundreds of liters of bacterial cultures in the lab, we agree that liquid cultures spun down in a centrifuge look tasty, but if the smell is any indication, the taste probably won’t live up to expectations. Still, when a 500-meter-wide chunk of asteroid can produce enough nutritionally complete food to sustain between 600 and 17,000 astronauts for a year without having to ship it up the gravity well, concessions will likely be made. We expect that this won’t apply to the nascent space tourism industry, which for the foreseeable future will probably build its customer base on deep-pocketed thrill-seekers, a group that’s not known for its ability to compromise on creature comforts.

Continue reading “Hackaday Links: October 13, 2024”

Creating Video Games With AI: A Mario Example

Artificial intelligence (AI) seems to be doing everything these days. Making images, making videos, and replacing most of us real human writers if you believe the hype. Maybe it’s all over! And yet, we persist, to write about yet another job taken over by AI: creating video games.

The research paper is entitled “Video Game Generation: A Practical Study using Mario.” The basic idea is whether a generative AI model can create an interactive video game by first training it on an existing game.

MarioVGG, as it is called, is a “text-to-video model.” It hasn’t built the Mario game that you’re familiar with, though. It takes player commands as text inputs—such as “run, or “jump”—and then outputs video frames showing the result in the ‘game.’ The model was trained on a dataset of frame-by-frame Super Mario Brothers game play, combined with data on user inputs at the time. The model shows an ability to generate believable video output for given player inputs, including basic game physics, item interactions, and collisions. It’s able to do this in a chained way, so that it can reasonably simulate a player making multiple actions and moving through a level of the game.

It’s not like playing a real Mario game yet, by any means. Regardless, the AI model has shown an ability to replicate the world of the game in a way that behaves relatively consistently with its established rules. If you’re in the field of video game development, though, you probably don’t have a lot to worry about just yet—you probably moved past making basic Mario clones years ago, so you’ve got quite an edge for now!

Hackaday Links Column Banner

Hackaday Links: September 29, 2024

There was movement in the “AM Radio in Every Vehicle Act” last week, with the bill advancing out of the US House of Representatives Energy and Commerce Committee and heading to a full floor vote. For those not playing along at home, auto manufacturers have been making moves toward deleting AM radios from cars because they’re too sensitive to all the RF interference generated by modern vehicles. The trouble with that is that the government has spent a lot of effort on making AM broadcasters the centerpiece of a robust and survivable emergency communications system that reaches 90% of the US population.

The bill would require cars and trucks manufactured or sold in the US to be equipped to receive AM broadcasts without further fees or subscriptions, and seems to enjoy bipartisan support in both the House and the Senate. Critics of the bill will likely point out that while the AM broadcast system is a fantastic resource for emergency communications, if nobody is listening to it when an event happens, what’s the point? That’s fair, but short-sighted; emergency communications isn’t just about warning people that something is going to happen, but coordinating the response after the fact. We imagine Hurricane Helene’s path of devastation from Florida to Pennsylvania this week and the subsequent emergency response might bring that fact into focus a bit.

Continue reading “Hackaday Links: September 29, 2024”