Direct memory access (DMA) systems in computers are more powerful than you might think, and [Bruce Land] and [Joseph Primmer] have done some clever hacking to take full advantage of this on the PIC32 microcontrollers. This is a cool proof-of-concept hack — you can do general computing in the DMA subsystem without using the CPU at all if you don’t mind taking your time — but they also include two useful examples: a direct digital synthesis machine and a random number generator. Both of these run using exactly 0% CPU time.
How do they do it? DMA is a mechanism for shuttling data around in memory or between hardware peripherals without involving the CPU. Say you want to take a large block of memory containing music, and spit it out slowly to an I2S audio converter. A DMA subsystem could be configured to take an interrupt from the sound chip, pass it a chunk of data, increment the data pointer, and wait for the next interrupt.
The gimmick, which goes back at least to [Rushanan] and [Checkoway]’s “Run DMA” paper, is that you can modify the memory source and destination addresses of one DMA service from another DMA service, and that some registers automatically perform mathematical operations on whatever data is put into them. Combine these together, and you’ve got transport-triggered programming.
(An awesome side-note: our own [Al Williams] developed a one-instruction transport-triggered CPU way back in the day: the One Instruction Wonder.)
What is this good for? Writing simple helper applications that run independent of the CPU on a PIC32 microcontroller. [Land] and [Primmer]’s direct-digital synthesis example is a great one. But there are a lot of cases where you simply want to take in some new data and pre-process it a little bit before it enters the main program flow. While creating weird machines in the DMA engine might be a slower way to get it done, it keeps the CPU free for doing other stuff. We’re sure you’ll come up with something.
“The architecture of the Amiga Copper has all the basic features of a transport triggered architecture.”
Just think if the Amiga had been more popular.
I did a one-instruction TTL CPU – it was a lot of fun figuring out the hardware. And it was surprisingly fun to program.
https://hackaday.io/project/21298-one-instruction-ttl-computer
Sadly not all DMA engines are equal and they are limited to uC with more flexible triggering and more flexible DMA engines. DMA can easily save double digits of cycles for IRQ handle that have to manages a buffer to/from a device.
I have used DMA triggered by pin change to collect PS/2 data bits from I/O instead of interrupts. Too bad that the bit-banding on Cortex is only available for the core not the DMA engine, so it couldn’t do the serial to parallel conversion too.
I have also use DMA to recombine 2 streams of mono audio samples into interleaving them. I used 2 passes chaining DMA.
They are very fun to play with.
This would be nice, but
“The PIC32 has 4 DMA controllers which can stream an agregate of about 3.2 megabytes/sec without affecting CPU performance”
the 3.2 MB/s is way too slow.
I disagree, considering you’re performing the operations effectively for free while your actual application is running on the CPU. You’re not going to be streaming 1080p video, but you could read a few sensors and store the results until the CPU is available.
That reminds me. Anyone try with the DMA on their video card?
Wow! I was not aware of the presence of a DMA in graphic cards, but it does make a lot of sense. I believe that proper utilisation of the GPU DMA could potentially be a huge performance boost in non-standard use of GPUs (crypto work, AI stuff etc.).
Cortex based MCU-s DMA engines are at least an order of magnitude faster than this.
STM32F303 has a 5MSPS 12bit ADC, this 3MB/s is not even the half of the required bandwidth for this ADC.
You can drive huge led matrix panels or even LCD’s with stm32f4 dma over gpio. (most of the STM32 DMA controllers can be triggered from timer)
Depends on the intended usage, which have molded the req. spec. If it is only meant to handle 25Mbps (which suits most SPI chips and even the fastest I2C speed) then it is fine.
It’s a 40Mhz Micro…. and it takes you zero CPU cycles to do 25.6Mbits a second with DMA. That’s acutally fairly impressive. Presumably the faster ones are also likewise faster, there are 80 and 200Mhz variants of this micro.
The 200Mhz model can probably handle full 100Mbit Ethernet entirely in the DMA engine and the 250 or so Mhz version probably has enough left over performance to do some processing on packets all while the CPU sits idle.
Yeah, 3.2MB/s is definitely in error. I’ve set up a PIC32MZ DMA controller streaming 50MB/s in the form of a 16-bit-per-pixel VGA out.