Direct Memory Access: Data Transfer Without Micro-Management

In the most simple computer system architecture, all control lies with the CPU (Central Processing Unit). This means not only the execution of commands that affect the CPU’s internal register or cache state, but also the transferring of any bytes from memory to to devices, such as storage and interfaces like serial, USB or Ethernet ports. This approach is called ‘Programmed Input/Output’, or PIO, and was used extensively into the early 1990s for for example PATA storage devices, including ATA-1, ATA-2 and CompactFlash.

Obviously, if the CPU has to handle each memory transfer, this begins to impact system performance significantly. For each memory transfer request, the CPU has to interrupt other work it was doing, set up the transfer and execute it, and restore its previous state before it can continue. As storage and external interfaces began to get faster and faster, this became less acceptable. Instead of PIO taking up a few percent of the CPU’s cycles, a big transfer could take up most cycles, making the system grind to a halt until the transfer completed.

DMA (Direct Memory Access) frees the CPU from these menial tasks. With DMA, peripheral devices do not have to ask the CPU to fetch some data for them, but can do it themselves. Unfortunately, this means multiple systems vying for the same memory pool’s content, which can cause problems. So let’s look at how DMA works, with an eye to figuring out how it can work for us.
Continue reading “Direct Memory Access: Data Transfer Without Micro-Management”

PIC32 DMA Is A Weird Machine

Direct memory access (DMA) systems in computers are more powerful than you might think, and [Bruce Land] and [Joseph Primmer] have done some clever hacking to take full advantage of this on the PIC32 microcontrollers. This is a cool proof-of-concept hack — you can do general computing in the DMA subsystem without using the CPU at all if you don’t mind taking your time — but they also include two useful examples: a direct digital synthesis machine and a random number generator. Both of these run using exactly 0% CPU time.

How do they do it? DMA is a mechanism for shuttling data around in memory or between hardware peripherals without involving the CPU. Say you want to take a large block of memory containing music, and spit it out slowly to an I2S audio converter. A DMA subsystem could be configured to take an interrupt from the sound chip, pass it a chunk of data, increment the data pointer, and wait for the next interrupt.

The gimmick, which goes back at least to [Rushanan] and [Checkoway]’s “Run DMA” paper, is that you can modify the memory source and destination addresses of one DMA service from another DMA service, and that some registers automatically perform mathematical operations on whatever data is put into them. Combine these together, and you’ve got transport-triggered programming.

(An awesome side-note: our own [Al Williams] developed a one-instruction transport-triggered CPU way back in the day: the One Instruction Wonder.)

What is this good for? Writing simple helper applications that run independent of the CPU on a PIC32 microcontroller. [Land] and [Primmer]’s direct-digital synthesis example is a great one. But there are a lot of cases where you simply want to take in some new data and pre-process it a little bit before it enters the main program flow. While creating weird machines in the DMA engine might be a slower way to get it done, it keeps the CPU free for doing other stuff. We’re sure you’ll come up with something.

Bitluni Brings All The ESP-32 Multimedia Hacks To Supercon

Of all the people I was looking forward to meeting at Supercon, aside from my Hackaday colleagues with whom I had worked for five years without ever meeting, was a fellow from Germany named Matthias Balwierz. The name might not ring a bell, but he’ll certainly be familiar to Hackaday readers as Bitluni, the sometimes goofy but always entertaining and enlightening face of “Bitluni’s Lab” on YouTube.

I’d been covering Bitluni’s many ESP32 hacks over the years, and had struck up a correspondence with him, swapping ideas and asking for advice on the many projects I start but somehow never finish. Luckily for us, Bitluni is far better on follow-through than I am, and he brought that breadth and depth of experience to the Design Lab stage for that venue’s last talk of the 2019 Superconference, before the party moved next door for the badge-hacking presentations.

Continue reading “Bitluni Brings All The ESP-32 Multimedia Hacks To Supercon”

How Many LEDs Can You Drive?

Driving more than a handful of LEDs from a microcontroller is often a feat that takes tedious wiring, tricking the processor, or a lot of extra external hardware. Charlieplexing is perhaps the most notorious of these methods, and checks two of those three boxes. This library for the Teensy 4.0 checks all three, but it can also drive a truly staggering 32,000 LEDs at one time.

The TriantaduoWS2811 library is able to drive 32 channels of LEDs from a Teensy 4.0 using only three pins and minimal processor resources. It uses the FlexIO and DMA subsystems of the i.MX RT1062, the particular ARM processor on the Teensy, to drive four external shift registers. Together, the system is able to achieve 30 frames per second on with 1,000 LEDs per channel, for a total of 32,000 LEDs. Whoah.

[Ward] aka [wramsdell] wondered what one would do with all of the horsepower of a Teensy microcontroller when he first saw its specifications, and was able to build this project to take advantage of its features. What’s surprising, though, is that it doesn’t use nearly everything the processor is capable of, so you can do other tasks at the same time as driving that giant LED display.

The Multiyear Hunt For A Gameboy Game’s Bug

[Enddrift] had a real problem trying to run a classic game, Hello Kitty Collection: Miracle Fashion Maker, into a GBA (Gameboy Advance) emulator. During startup, the game would hit an endless loop waiting for a read from a non-existent memory location and thus wouldn’t start under the emulator. The problem is, the game works on real hardware even though that memory doesn’t exist there, either.

To further complicate things, a similar bug exists when loading a saved game under Sonic Pinball Party. Then a hack for Pokemon Emerald surfaced that helped break the case. The story is pretty interesting.

Continue reading “The Multiyear Hunt For A Gameboy Game’s Bug”

Memory Mapping Methods In The Super Nintendo

Not only is the Super Nintendo an all-around great platform, both during its prime in the 90s and now during the nostalgia craze, but its relative simplicity compared to modern systems makes it a lot more accessible from a computer science point-of-view. That means that we can get some in-depth discussion on how the Super Nintendo actually does what it does, and understand most of it, like this video from [Retro Game Mechanics Explained] which goes into an incredible amount of detail on the mechanics of the SNES’s memory system.

Two of the interesting memory systems the SNES uses are called DMA and HDMA. DMA stands for direct memory access, and is a way for the Super Nintendo to access memory independently of the CPU. The advantages to this are that it’s incredibly fast compared to more typical methods of accessing memory. This isn’t particulalry unique, but the HDMA system is. It allows the SNES to do all kinds of interesting tricks with its video output display like changing color gradients and doing all kinds of masking effects.

If you’re interested in the inner workings of classic consoles like the SNES, this video gets way down in the weeds in the system itself. It’s interesting to see how programmers were able to squeeze more capability from these limited (by modern standards) systems by manipulating memory like the DMA and HDMA systems do.  [Retro Game Mechanics Explained] is a great resource for exploring in-depth aspects of lots of classic games, like how speedrunners can execute arbitrary code in old Mario games.

Continue reading “Memory Mapping Methods In The Super Nintendo”

34C3: Roll Your Own Network Driver In Four Simple Steps

Writing your own drivers is a special discipline. Drivers on the one hand work closely with external hardware and at the same time are deeply ingrained into the operating system. That’s two kinds of specialization in one problem. In recent years a lot of dedicated networking hardware is being replaced by software. [Paul Emmerich] is a researcher who works on improving the performance of these systems.

Making software act like network hardware requires drivers that can swiftly handle a lot of small packets, something that the standard APIs where not designed for. In his talk at this year’s Chaos Commnication Congress [Paul] dissects the different approaches to writing this special flavor of drivers and explains the shortcomings of each.

Continue reading “34C3: Roll Your Own Network Driver In Four Simple Steps”