An Animated Walkthrough Of How Large Language Models Work

If you wonder how Large Language Models (LLMs) work and aren’t afraid of getting a bit technical, don’t miss [Brendan Bycroft]’s LLM Visualization. It is an interactively-animated step-by-step walk-through of a GPT large language model complete with animated and interactive 3D block diagram of everything going on under the hood. Check it out!

nano-gpt has only around 85,000 parameters, but the operating principles are all the same as for larger models.

The demonstration walks through a simple task and shows every step. The task is this: using the nano-gpt model, take a sequence of six letters and put them into alphabetical order.

A GPT model is a highly complex prediction engine, so the whole process begins with tokenizing the input (breaking up words and assigning numerical values to the chunks) and ends with choosing an appropriate output from a list of probabilities. There are of course many more steps in between, and different ways to adjust the model’s behavior. All of these are made quite clear by [Brendan]’s process breakdown.

We’ve previously covered how LLMs work, explained without math which eschews gritty technical details in favor of focusing on functionality, but it’s also nice to see an approach like this one, which embraces the technical elements of exactly what is going on.

We’ve also seen a much higher-level peek at how a modern AI model like Anthropic’s Claude works when it processes requests, extracting human-understandable concepts that illustrate what’s going on under the hood.

Exploring The Gakken FX Micro-Computer

Early computer kits aimed at learning took all sorts of forms, from full-fledged computer kits like the Altair 8800 to the ready-made MicroBee Computer-In-A-Book. For those just wanting to dip their toes in the computing world, many low-cost computer “trainers” were released, and Japan had some awesome ones. [Jason Jacques] shows off his Gakken Micro-Computer FX-System (or is it the FX-Computer? Or maybe the FX-Micom? It seems like they couldn’t make up their minds). In any event, it was a combination microcomputer and I/O building blocks system running a custom version of the Texas Instrument TMS1100 microprocessor. Specifically designed to introduce users to the world of computing, the included guide is very detailed and includes 100 example programs and lots of information on how all the opcodes work.

This 4-bit system is similar to the Kenbak computer, with a very simple instruction set and limited address space. However, adding electronic components in plastic blocks brings this machine to a new level of interactivity. Connections can be made to and from the microcomputer block, as well as to the on-board speaker and simple input/output pins.  The example circuit displayed on the front cover of the box enables the microcontroller to connect to the speaker and allows a switch to light up a small incandescent bulb. We can imagine many users wiring up all sorts of extra components to their FX-Computers, and with the advent of 3D printing, it wouldn’t be difficult to create new blocks to insert into the grid.

Continue reading “Exploring The Gakken FX Micro-Computer”

Close-up of a CPU

Register Renaming: The Art Of Parallel Processing

In the quest for faster computing, modern CPUs have turned to innovative techniques to optimize instruction execution. One such technique, register renaming, is a crucial component that helps us achieve the impressive multi-tasking abilities of modern processors. If you’re keen on hacking or tinkering with how CPUs manage tasks, this is one concept you’ll want to understand. Here’s a breakdown of how it works and you can watch the video, below.

In a nutshell, register renaming allows CPUs to bypass the restrictions imposed by a limited number of registers. Consider a scenario where two operations need to access the same register at once: without renaming, the CPU would be stuck, having to wait for one task to complete before starting another. Enter the renaming trick—registers are reassigned on the fly, so different tasks can use the same logical register but physically reside in different slots. This drastically reduces idle time and boosts parallel tasking. Of course, you also have to ensure that the register you are using has the correct contents at the time you are using it, but there are many ways to solve that problem. The basic technique dates back to some IBM System/360 computers and other high-performance mainframes.

Register renaming isn’t the only way to solve this problem. There’s a lot that goes into a superscalar CPU.

Continue reading “Register Renaming: The Art Of Parallel Processing”

Nix + Automated Fuzz Testing Finds Bug In PDF Parser

[Michael Lynch]’s adventures in configuring Nix to automate fuzz testing is a lot of things all rolled into one. It’s not only a primer on fuzz testing (a method of finding bugs) but it’s also a how-to on automating the setup using Nix (which is a lot of things, including a kind of package manager) as well as useful info on effectively automating software processes.

[Michael] not only walks through how he got it all up and running in a simplified and usefully-portable way, but he actually found a buffer overflow in pdftotext in the process! (Turns out someone else had reported the same bug a few weeks before he found it, but it demonstrates everything regardless.)

[Michael] chose fuzz testing because using it to find security vulnerabilities is conceptually simple, actually doing it tends to require setting up a test environment with a complex workflow and a lot of dependencies. The result has a high degree of task specificity, and isn’t very portable or reusable. Nix allowed him to really simplify the process while also making it more adaptable. Be sure to check out part two, which goes into detail about how exactly one goes from discovering an input that crashes a program to tracking down (and patching) the reason it happened.

Making fuzz testing easier (and in a sense, cheaper) is something people have been interested in for a long time, even going so far as to see whether pressing a stack of single-board computers into service as dedicated fuzz testers made economic sense.

Hardware-in-the-Loop Continuous Integration

How can you tell if your software is doing what it’s supposed to? Write some tests and run them every time you change anything. But what if you’re making hardware? [deqing] has your back with the Automatic Hardware Testing rig. And just as you’d expect in the software-only world, you can fire off the system every time you update the firmware in your GitHub.

A Raspberry Pi compiles the firmware in question and flashes the device under test. The cool part is the custom rig that simulates button presses and reads the resulting values out. No actual LEDs are blinked, but the test rig looks for voltages on the appropriate pins, and a test passes when the timing is between 0.95 and 1.05 seconds for the highs and lows. Firing this entire procedure off at every git check-in ensures that all the example code is working.

So far, we can only see how the test rig would work with easily simulated peripherals. If your real application involved speaking to a DAC over I2C, for instance, you’d probably want to integrate that into the test rig, but the principle would be the same.

Are any of you doing this kind of mock-up hardware testing on your projects? Is sounds like it could catch bad mistakes before they got out of the house.

Lock-In Thermography On A Cheap IR Camera

Seeing the unseen is one of the great things about using an infrared (IR) camera, and even the cheap-ish ones that plug into a smartphone can dramatically improve your hardware debugging game. But even fancy and expensive IR cameras have their limits, and may miss subtle temperature changes that indicate a problem. Luckily, there’s a trick that improves the thermal resolution of even the lowliest IR camera, and all it takes is a little tweak to the device under test and some simple math.

According to [Dmytro], “lock-in thermography” is so simple that his exploration of the topic was just a side quest in a larger project that delved into the innards of a Xinfrared Xtherm II T2S+ camera. The idea is to periodically modulate the heat produced by the device under test, typically by ramping the power supply voltage up and down. IR images are taken in synch with the modulation, with each frame having a sine and cosine scaling factor applied to each pixel. The frames are averaged together over an integration period to create both in-phase and out-of-phase images, which can reveal thermal details that were previously unseen.

With some primary literature in hand, [Dmytro] cobbled together some simple code to automate the entire lock-in process. His first test subject was a de-capped AD9042 ADC, with power to the chip modulated by a MOSFET attached to a Raspberry Pi Pico. Integrating the images over just ten seconds provided remarkably detailed images of the die of the chip, far more detailed than the live view. He also pointed the camera at the Pico itself, programmed it to blink the LED slowly, and was clearly able to see heating in the LED and onboard DC-DC converter.

The potential of lock-in thermography for die-level debugging is pretty exciting, especially given how accessible it seems to be. The process reminds us a little of other “seeing the unseeable” techniques, like those neat acoustic cameras that make diagnosing machine vibrations easier, or even measuring blood pressure by watching the subtle change in color of someone’s skin as the capillaries fill.

Seven New Street Fighter 2 Arcade Rom Hacks

[Sebastian Mihai] is a prolific programmer and hacker with a particular focus on retrocomputing and period games, and this latest hack, adding new gameplay elements to Capcom’s Street Fighter II – Champion Edition, is another great one. [Sebastian] was careful to resist changing the game physics, as that’s part of what makes this game ‘feel’ the way it does, but added some fun extra elements, such as the ability to catch birds, lob barrels at the other player, and dodge fire.

The title screen was updated for each of the different versions, so there is no doubt about which was being played. This work was based on their previous hacks to Knights of the Round. Since both games shared the same Capcom CPS-1 hardware, the existing 68000 toolchain could be reused, reducing the overhead for this new series of hacks. Continue reading “Seven New Street Fighter 2 Arcade Rom Hacks”