The Raspberry Pi is a powerful computer in a compact form factor, making it highly useful for all manner of projects. However, it lacks some of the IO capabilities you might find on a common microcontroller. This is most apparent when it comes to running addressable LED strings. Normally, this is done using the Pi’s PWM or audio output, and is limited to just a couple of short strings. However, [Jeremy P Bentham] has found a way to leverage the Pi’s hardware to overcome these limitations.
The trick is using the Raspberry Pi’s little-documented Secondary Memory Interface. The SMI hardware allows the Pi to shift out data to 8 or 16 I/O pins in parallel using direct memory access (DMA), with fast and accurate timing. This makes it perfect for generating signals such as those used by WS2812B LEDs, also known as NeoPixels.
With [Jeremy]’s code and the right supporting hardware, it’s possible to run up to 16 LED strips of arbitrary length from the Raspberry Pi. [Jeremy] does a great job outlining how it all works, covering everything from the data format used by WS2812B LEDs to the way cache needs to be handled to avoid garbled data. The hack works on all Pis, from the humble Pi Zero to the powerful Pi 4. Thanks to using DMA, the technique doesn’t overload the CPU, so performance should be good across the board.
Of course, there are other ways to drive a ton of LEDs; we’ve seen 20,000 running on an ESP32, for example.
[Thanks to Petiepooo for the tip!]
18 thoughts on “Running Way More LED Strips On A Raspberry Pi With DMA”
In case someone wants to run 27 parallel strips, I have a kernel module for Raspberry Pi Zero to do that. Although this is a much cleaner, less violent solution.
Freezing the kernel to twiddle GPIO pins works, but as you mention in your repo, chains would need to be short, and other tasks would suffer during the freeze.
The hope with this is that a mid-level Raspberry Pi, like a 3B, can work all the pieces at once: run FPP to generate E1.31 while also driving up to 16 channels of 4 universes (680 RGB LEDs). Add a protoboard with a couple of SN74HCT245N buffers to protect GPIO and boost to 5V, and a power distribution network for powering the LEDs, and you’ve got all the makings of a show. There is no standalone controller for +100k pixels that could beat it on price.
Mind you, there’s a little bit of work to be done yet. Someone needs to distill his research work and add the E1.31 receiver portion…
Someone have ported this kernel module for RPi 2 and 3?
Does the existence of this DMA mode imply that hooking up an actual hard drive (not via USB) would be possible?
Seems someone was already working on that: https://github.com/fenlogic/IDE_trial
Gosh, someone managed to reverse engineer some obscure interface of this piece of fruit.
Beaglebone was doing this in 2013:
Beaglebone Black is getting a bit old and is slow for today’s standards, but with it’s PRU’s it can still pack a punch in areas where it matters. Programming the PRU’s has been a complication in the beginning, but a few years ago (I think in 2017) GCC was ported to it, which should make development cycles easier.
Upon reading the above again. seems the PRU’s weren’t even used, but the Beaglebones got some help from some teensy’s.
I was going to mention the history of the various Teensy boards to do a similar trick, but the BB seems to do it even better.
The VAX and the PDP11 had DRV11 high speed parallel interfaces back in the 1970s
Could recommend some non-obscure board as a replacement for RPi with at least 4GBs RAM and roughly the same power in the same format?
Regarding the power and RAM I know only about x86_64 boards and they won’t fit into a small RC helicopters and planes I’m interested in putting such board it into :(
Typo. Message should have started with “Could you …”.
The Odroid boards are better built and well supported. The C4 looks to be what you want. If you want a more rugged machine with even more CPU, the N2 is a good choice.
Last time I had an Odroid board it didn’t have support in upstream kernel. I was locked to an archaic version :(
Which board was that?
Is there a specific reason you want an SBC on the aircraft, you could have a microcontroller on the craft communicating with a ground station with an SBC, you can still stream video and information back to do any kind of object detection and just have a small, power efficient microcontroller on board.
Or if you need an SBC I’m pretty sure banana pi or another such company make something comparable to a pi 4 but in a pi zero form factor so that might be worth looking at.
So I skimmed all four of [Jeremy P Bentham’s] posts on this RPi SMI/DMA topic. Nowhere it seems does he say how fast he can actually output bits of data, perhaps the most important piece of information on the subject. However, I did see on a GitHub page (link above in [limroh’s] post) that someone was partially successful in making an IDE interface using a RPi’s SMI port that worked at 44 Mbytes/sec.
The subject interests me because often you will see SoC’s touting system clock speeds in the GHz range, so you would think you could get some pretty fast GPIO, but in the end you discover you can’t. It turns out that sometimes all the fast stuff on the chip is wrapped up by some proprietary bus architecture that acts as a bottleneck. Case in-point is the ARM Advanced Microcontroller Bus Architecture (a.k.a. AMBA). You would think you could just go to a datasheet or app note to know all about this – but nooooo, not with a Broadcom product, it’s a “secret”. (Broadcom makes the SoC in the RPi.)
1. ARM Advanced Microcontroller Bus Architecture (AMBA)
He talks about the timing of the WS2812 data stream in the linked post and how he’s able to send one pulse cycle every 1.2µs, either as 0.8 on and 0.4 off or vice versa. That’s slightly off the 1.25µs (800kHz) that is called for in the LED’s spec, as it means he’s running it at 833kHz. It may still be possible to adjust the SMI’s clock to meet the official spec, as I think there’s still things to be discovered about the peripheral. However, the pulse lengths would not exactly match the spec, even though it appears to work reliably.
What’s missing is how long the LED chains can be, as there must be limits to the size of the DMA transfers that can be setup, or the max latency possible to keep the DMA engine fed if user-space needs to service it more than once per data cycle. Ultimately, that may affect the max frame rate or chain length on less powerful PIs like the Zero, but I have high hopes for the 3 or 4.
One of the big issues with ambilight pi projects is the need to use an arduino. I wonder if this can be combined with ambilight capability for a simple project.
Have you seen https://www.instructables.com/DIY-Ambilight-With-Raspberry-Pi-and-NO-Arduino-Wor/ ? It should also be possible to use the https://github.com/jgarff/rpi_ws281x library and the Pi’s PWM or PCM peripherial for non-SPI LED strips, provided Hyperion gains support for that.
Please be kind and respectful to help make the comments section excellent. (Comment Policy)