Running Way More LED Strips On A Raspberry Pi With DMA

The Raspberry Pi is a powerful computer in a compact form factor, making it highly useful for all manner of projects. However, it lacks some of the IO capabilities you might find on a common microcontroller. This is most apparent when it comes to running addressable LED strings. Normally, this is done using the Pi’s PWM or audio output, and is limited to just a couple of short strings. However, [Jeremy P Bentham] has found a way to leverage the Pi’s hardware to overcome these limitations.

The trick is using the Raspberry Pi’s little-documented Secondary Memory Interface. The SMI hardware allows the Pi to shift out data to 8 or 16 I/O pins in parallel using direct memory access (DMA), with fast and accurate timing. This makes it perfect for generating signals such as those used by WS2812B LEDs, also known as NeoPixels.

With [Jeremy]’s code and the right supporting hardware, it’s possible to run up to 16 LED strips of arbitrary length from the Raspberry Pi. [Jeremy] does a great job outlining how it all works, covering everything from the data format used by WS2812B LEDs to the way cache needs to be handled to avoid garbled data. The hack works on all Pis, from the humble Pi Zero to the powerful Pi 4. Thanks to using DMA, the technique doesn’t overload the CPU, so performance should be good across the board.

Of course, there are other ways to drive a ton of LEDs; we’ve seen 20,000 running on an ESP32, for example.

[Thanks to Petiepooo for the tip!]

16 thoughts on “Running Way More LED Strips On A Raspberry Pi With DMA

    1. Freezing the kernel to twiddle GPIO pins works, but as you mention in your repo, chains would need to be short, and other tasks would suffer during the freeze.
      The hope with this is that a mid-level Raspberry Pi, like a 3B, can work all the pieces at once: run FPP to generate E1.31 while also driving up to 16 channels of 4 universes (680 RGB LEDs). Add a protoboard with a couple of SN74HCT245N buffers to protect GPIO and boost to 5V, and a power distribution network for powering the LEDs, and you’ve got all the makings of a show. There is no standalone controller for +100k pixels that could beat it on price.
      Mind you, there’s a little bit of work to be done yet. Someone needs to distill his research work and add the E1.31 receiver portion…

  1. Gosh, someone managed to reverse engineer some obscure interface of this piece of fruit.

    Beaglebone was doing this in 2013:
    https://hackaday.com/2013/09/13/a-23-feet-tall-pyramid-with-0-31-mile-of-led-strips/

    Beaglebone Black is getting a bit old and is slow for today’s standards, but with it’s PRU’s it can still pack a punch in areas where it matters. Programming the PRU’s has been a complication in the beginning, but a few years ago (I think in 2017) GCC was ported to it, which should make development cycles easier.

    Upon reading the above again. seems the PRU’s weren’t even used, but the Beaglebones got some help from some teensy’s.

    1. Could recommend some non-obscure board as a replacement for RPi with at least 4GBs RAM and roughly the same power in the same format?
      Regarding the power and RAM I know only about x86_64 boards and they won’t fit into a small RC helicopters and planes I’m interested in putting such board it into :(

  2. So I skimmed all four of [Jeremy P Bentham’s] posts on this RPi SMI/DMA topic. Nowhere it seems does he say how fast he can actually output bits of data, perhaps the most important piece of information on the subject. However, I did see on a GitHub page (link above in [limroh’s] post) that someone was partially successful in making an IDE interface using a RPi’s SMI port that worked at 44 Mbytes/sec.

    The subject interests me because often you will see SoC’s touting system clock speeds in the GHz range, so you would think you could get some pretty fast GPIO, but in the end you discover you can’t. It turns out that sometimes all the fast stuff on the chip is wrapped up by some proprietary bus architecture that acts as a bottleneck. Case in-point is the ARM Advanced Microcontroller Bus Architecture (a.k.a. AMBA).[1] You would think you could just go to a datasheet or app note to know all about this – but nooooo, not with a Broadcom product, it’s a “secret”. (Broadcom makes the SoC in the RPi.)

    1. ARM Advanced Microcontroller Bus Architecture (AMBA)

    https://en.wikipedia.org/wiki/Advanced_Microcontroller_Bus_Architecture

    1. He talks about the timing of the WS2812 data stream in the linked post and how he’s able to send one pulse cycle every 1.2µs, either as 0.8 on and 0.4 off or vice versa. That’s slightly off the 1.25µs (800kHz) that is called for in the LED’s spec, as it means he’s running it at 833kHz. It may still be possible to adjust the SMI’s clock to meet the official spec, as I think there’s still things to be discovered about the peripheral. However, the pulse lengths would not exactly match the spec, even though it appears to work reliably.
      What’s missing is how long the LED chains can be, as there must be limits to the size of the DMA transfers that can be setup, or the max latency possible to keep the DMA engine fed if user-space needs to service it more than once per data cycle. Ultimately, that may affect the max frame rate or chain length on less powerful PIs like the Zero, but I have high hopes for the 3 or 4.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.