Here’s the thing about coding. When you’re working on embedded projects, it’s quite easy to run into hardware limitations, and quite suddenly, too. You find yourself desperately trying to find a way to speed things up, only… there are no clock cycles to spare. It’s at this point that you might reach for the magic of direct memory access (DMA). [Larry] is here to advocate for its use.
DMA isn’t just for the embedded world; it was once a big deal on computers, too. It’s just rarer these days due to security concerns and all that. Whichever platform you’re on, though, it’s a valuable tool to have in your arsenal. As [Larry] explains, DMA is a great way to move data from memory location to memory location, or from memory to peripherals and back, without involving the CPU. Basically, a special subsystem handles trucking data from A to B while the CPU gets on with whatever other calculations it had to do. It’s often a little more complicated in practice, but that’s what [Larry] takes pleasure in explaining.
Indeed, back before I was a Hackaday writer, I was no stranger to DMA techniques myself—and I got my project published here! I put it to good use in speeding up an LCD library for the Arduino Due. It was the perfect application for DMA—my main code could handle updating the graphics buffer as needed, while the DMA subsystem handled trucking the buffer out to the LCD quicksmart.
If you’re struggling with updating a screen or LED strings, or you need to do something fancy with sound, DMA might just be the ticket. Meanwhile, if you’ve got your own speedy DMA tricks up your sleeve, don’t hesitate to let us know!
Ex-Amiga coder here. DMA, you say? That brings back some memories. Can we talk about the copper and the blitter, too? As you allude, the words “DMA” and “security” don’t co-exist entirely happily, but then, the Amiga had pretty much zilch in the way of memory protection anyway.
Ah, a more trusting era!
You needed a floppy to get a virus.
And you never got a virus if you kept a condom on the floppy.
Use the wrong kind and you get a sloppy floppy.
And there were big name magazines sold on the high street which provided viruses as a surprise bonus with their cover disks.
CU Amiga definitely and I’m pretty sure at least one other, maybe Amiga Format, had cover disks that were infected.
Anyone else remember Amiga Format’s infamous “typo” – for want of a better word – wherein they left placeholder text in the description of a cover-disk’s contents? It was “Type Some [rude word that rhymes with “Bit”] In Here Please”… above, of all things, the summary of a kids’ painting program.
Lorem Ipsum is there for a purpose, people :)
Don’t know what you’re talking about DMA being a security issue. External forewire/thunderbolt/USB4 devices with unrestricted access to memory before the advent of IOMMUs? Sure. But DMA has been indispensable for decades. Hell, the linux kernel community just had a massive spat over bindings for the DMA subsystem’s in-kernel API. And it’s now becoming increasingly common to see new applications for DMA, such as from device to device like from a network card directly into the VRAM of a GPU, with no system ram involved. Normally system RAM has been either the source or destinations of a DMA transfer.
In general, direct access to anything is bound to cause some opportunity for exploitation. Security IS controlling access. It’s a tradeoff made after weighing risk and reward, which was a different calculus back in the day
“Direct memory access” doesn’t mean the device has free reign of system memory. I mean, it used to, but that’s like, decades ago now. (It was fun, though, you could do Sneaky Things).
Nowadays (like M said) – that’s what IOMMUs were created for. They work exactly the same as an MMU does for normal process management: each device thinks it’s got full access to memory but its accesses are virtualized through the IOMMU.
There’s no security risk with DMA through an IOMMU. You’re just handing it a restricted range of memory. It can’t access anything outside that.
It starts with: DMA is managed by the OS. User asks for part of a file, File system decides it needs a block from disk, disk driver configures DMA to happen with a disk-buffer as a destination. Safe, secure. (unless there are security bugs in which case you’re screwed anyway).
Things get complicated when you want to allow a user process, say an X server to have access to the DMA capabilities of a device because of performance reasons. That’s when you start needing the IOMMU.
Without an IOMMU you’re still vulnerable to malicious devices or malicious code on a device. It’s not really possible to protect the system from crashing or something due to a malicious device, but an IOMMU isolates what they have access to.
To be fully secure the system needs a way of isolating/logging the device before the OS comes up, which is where measured boot can help. Still a tough problem overall, although if the device can be unpowered before OS (trust) initialization that’d be enough.
@Pat Until we started exposing internal memory busses outside the computer, no “malicious devices” have ever had access to DMA capability. Generally speaking YOU MUST TRUST the hardware inside your computer.
Measured boot will do nothing to help you there. It only ensures that the software in your computer is what you expect it to be. It does not ensure any devices or IOMMUs are configured correctly, nor does it mean that an on-device IOMMU functions correctly, nor does measured boot ensure DMA functions correctly. It also does nothing to help you with devices that are on the inside of the system’s main IOMMU.
“Until we started exposing internal memory busses outside the computer”
DMA attacks from malicious devices have been possible for over 30 years. I accidentally did the equivalent of one in early 2000s in early testing with an FPGA device directly connected to PCI. And unless the devices are literally hanging off of the I/O space, they’ve got access to memory.
“Generally speaking YOU MUST TRUST the hardware”
oh we are sadly so past that point
“Measured boot will do nothing to help you there”
Yeah, it does, because it measures execution time and an early DMA attack will violate that and fail the hash check (there are good articles by security guys out there talking about this). To be clear there are flaws in all of the current measured boot implementations that would allow you to get around this, but that’s an implementation problem (as if the major vendors actually care about it, lol).
This isn’t a theory thing either, there have been demonstrated NIC card firmware attacks for years, and the early ones literally have “use an IOMMU” as one of the countermeasures.
That’s not what “direct access” means. It has NOTHING to do with security permissions.
Direct access means that the hardware device or a separate hardware copying engine can perform copies autonomously, without having to make the CPU run code in a loop that loads each byte into a CPU register and then writes it back out somewhere else.
Yeah, I just about spat out my coffee when I saw “rarer these days.” Um. No. Not even remotely close. The exact opposite, in fact.
There are only two ways to transfer data from a device to memory: either the CPU does it, or the device does it itself. If the CPU does it, it’s usually been called PIO or programmed I/O, and if the device does it, it’s direct memory access, or DMA.
The goal of a modern computer design is to have virtually nothing be PIO, because PIO is terrible since – at least for reads – it slows down the CPU to the speed of the peripheral, and the CPU is the fastest thing in the system and is busy.
DMA is so ubiquitous it’s possible the author thought it was ‘rare’ because it’s everywhere. You don’t have “dedicated DMA channels” because everything has DMA modes. Devices generally have a tiny set of registers and then a bunch of places for you to write DMA descriptors that hand them large regions of memory to dump data into.
Heck, the entirety of message-signaled interrupts on devices (where instead of asserting just a physical line you get a message from a device when something happens) is DMA. That’s how it works.
DMA is very powerful, but also rember it has overhead to srtup etc. Tranafering a few bytess from a sensor may noy be efficient, dumping hige swaths of display data, the overhead is likely worth it.
On a microcontroller this is definitely true because it’s usually driven by an external engine and so the CPU has to set it up. On a modern PC DMA is so built-in to the overall protocols that (as I mentioned above) it realistically should be used for almost anything except stuff like configuration/status where stuff like ordering or latency matters.
It’s a little different when you talk about writes to a device because then you’re not really slowing the system down. PCI hosts, for instance, typically buffer (and sometimes reorder, grr) write transactions from CPU to PCI so DMA from host to target is way less important than the reverse when talking about bandwidth (obviously not when talking about CPU usage).
Yeah there’s certainly truth to that. Also, any DMA-related hacks would generally require that the person has direct physical access to the device. That is, they can physically plug in a peripheral that sends unexpected data.
It’s been said many times; apparently it needs to be said again –
If an attacker has physical control of the device, they have control of the device. You aren’t going to stop a knowledgeable attacker who has the ability to literally replace your chips with their own. You can not secure against tampering by someone who can change the physical circuit. Never going to happen.
Someone who has physical control the device might want to circumvent the DRM. DMA-related hack might be used to overcome “DRM*. That’s a DRM issue, not a security issue. Of course, DRM has never been very effective and never will be, so not using DMA in order to try to make your DRM “bulletproof” is a fool’s errand. I worked on some DRM-related things in the 1990s and early 2000s. It didn’t work then and after 30 years of trying by huge companies that spent millions of dollars trying to develop effective DRM, it will doesn’t work today. It has always been, in the end, just a way to throw money away.
“Also, any DMA-related hacks would generally require that the person has direct physical access to the device. ”
Nope. Programmable hardware = new attack surface. Obviously you need a way to get malicious code on the device, but attack the vendor’s website, replace firmware, Bob’s your uncle.
Generally speaking only kernel drivers have access to programmable functionality, including the IOMMU. Unless A) you’re talking about external devices over firewire or thunderbolt or USB4 or B) you’re talking about a supply chain attack.
And if you’re worried about buying devices that are pre-compromised, you have a lot more to worry about. The many, many megabytes of microcode that is burned into your CPU from the factory, for example. Or the firmware of the dozens and dozens of little cores littered around inside modern CPUs, chipsets, and every little interface chip, power controller, etc etc. Code is everywhere.
“Generally speaking only kernel drivers have access to programmable functionality, including the IOMMU.”
oooh you need to look into the “can you trust your network card” article. That was one of the earliest ones, like 15 years ago.
https://cyber.gouv.fr/sites/default/files/IMG/pdf/csw-trustnetworkcard.pdf
“hi we’re going to put a programmable processor which parses and interprets incoming network data while having access to the system bus, this can’t possibly go wrong”
I dont most modern people care as they will use a pico and python just to blink an LED
Thats for beginners, you have to look at more mature projects and see plenty of C, assembler, and dma happening in the pico scene. There is more going on than blink a led, altho its the hello world for electronics
Everyone was new once – I’m sure your first micro or electronics project was nothing to write home about either.
Lots of people doing stuff with strings of WS2812 type LEDs – DMA makes a huge difference to how many LEDs you can address and what you do with them. The PIO module coupled with DMA on the RP2040 saves a lot of headaches…
Hello ,i just use DMA only for sensor and storage communication, otherwise i use function pointer, and multi thread . the power of C or C++ is pointer, programer who already have a Basic on calculus will realize the pointer behaviour just like Mapping in calculus. In ASM , a pointer is a addressee of a space memory.
It s very useful for scattered programme. Like i operate sensor parameter on a file , and. I operate SPI memory in other file , i dont care how to integral into main file. I just call the function pointer which have been written in header file , to main and use nomally , well it will speed up some part , the first thing is independent on operation, which mean i can use this code for other hardware. And it can be use to build UI latter .
It help to reduce a lot of abstraction. Such as i deleted a HAL generated from stm32 cube , and write just base on CMSIS framework and ASM and Os , then i use Ai to convert the protocol in HaL according with my code. It reduce abstraction significantly .
My project already using all pin of MCU from sensor to motor control and wifi at same time but still respond good in complex action, the key is schedule, if it s operate well .
Very interesting. Do you do this as a matter of habit or do your projects actually require the abstraction cost to be lowered?
It’s really nice that even the most basic 32 bit microcontrollers usually have DMA now. It was not very common in 8 bit microcontrollers.
Pic16 and Atmel xmega mcus both have dma
You can do magic with DMA and timers, the speedup on display refreshes especially is almost always worth it.
DMA is used in modern computers – it’s just instead of using a separate DMA controller most devices support bus mastering. Bus mastering means the device writes to memory directly so the device itself contains a built in DMA controller that talks to memory directly.
“He’s the most feared bitslinger in Dodge City- Bus Masterson!”
Move with Offset [MVO] was one of the worst IBM 360 instructions, imo.
AI Overview.
In IBM System/360 assembly language, the “move with offset” instruction, often abbreviated as MVO, is used to move a character string from one memory location to another, allowing for a specified offset within the source and destination fields.
Intel 8086 instruction MOVS with direction bit was one of the most powerful x86 instructions because it set up a DMA?
No it doesn’t? Your post doesn’t make much sense.
REP MOVSQ is a powerful instruction today but have nothing to do with DMA, but a processor based block move with a peak throughput of 32 bytes per clock (for Intel) is nothing to sneeze at. IOW at best it can use the full L1D cache bandwidth
i don’t understand referencing DMA as ‘magic’ of last resort. whenever i sit down to embedded work, the very first thing i do is decide how to use all of the fancy i/o peripherals in the chip. i am always keeping an eye to how often they’ll generate interrupts or how many cycles they’ll use up polling them in the main loop. it’s not the last resort, it’s literally the first thing i look at. i never find myself in a corner, because my first step was to evaluate this exact problem.
if there’s not a suitable peripheral and i have to bitbang it, i am always asking, am i going to be able to do this in the main loop, or do i have enough counter-timer units to spare for this task? will the interrupts stack on top of eachother and introduce latency? which leads to the genuine question: is this the right chip for this task?
i mean i imagine a lot of us have at least fantasized about this problem…should we hack together our own USB interface, or buy a more expensive mcu that has a good USB peripheral built in? and doesn’t every USB peripheral use DMA? and isn’t it 2025, where the USB peripheral is omnipresent in cheap chips?