Inefficient NeoPixel Control Solved With Hardware Hackery

Everyone loves NeoPixels. Individually addressable RGB LEDs at a low price. Just attach an Arduino, load the demo code, and enjoy your blinking lights.

But it turns out that demo code isn’t very efficient. [Ben Heck] practically did a spit take when he discovered that the ESP32 sample code for NeoPixels used a uint32 to store each bit of data. This meant 96 bytes of RAM were required for each LED. With 4k of RAM, you can control 42 LEDs. That’s the same amount of RAM that the Apollo Guidance Computer needed to get to the moon!

His adventure is based on the thought that you should be able to generate these signals with hardware SPI. First, he takes a look at Adafruit’s DMA-Driven NeoPixel example. While this is far more efficient than the ESP32 demo code, it still requires 3 SPI bits per bit of NeoPixel data. [Ben] eventually provides us with an efficient solution for SPI contro using a couple of 7400 series chips:

Schematic of SPI to NeoPixel circuit using 74HC123

[Ben]’s solution uses some external hardware to reduce software requirements. The 74HC123 dual multi-vibrator is used to generate the two pulse lengths needed for the NeoPixels. The timing for each multi-vibrator is set by an external resistor and capacitor, which are chosen to meet the NeoPixel timing specifications.

The 74HC123s are clocked by the SPI clock signal, and the SPI data is fed into an AND gate with the long pulse. (In NeoPixel terms, a long pulse is a logical 1.) When the SPI data is 1, the long pulse is passed through to the NeoPixels. Otherwise, only the short pulse is passed through.

This solution only requires a 74HC123, an AND gate, and an OR gate. The total cost is well under a dollar. Anyone looking to drive NeoPixels with a resource-constrained microcontroller might want to give this design a try. It also serves as a reminder that some problems are better solved in hardware instead of software.

[Thanks to Baldpower for the tip!]

40 thoughts on “Inefficient NeoPixel Control Solved With Hardware Hackery

  1. Reminds me of a old server I had that refused to use a display size larger than 640×480 with no display attached (remote desktop) despite all kinds of driver hackery attempts.
    Solution was a vga dummy consisting of three 75 ohm resistors, each one over a color output in the vga plug.
    Then the server gladly accepted whatever resolution you wanted it to display.

  2. Define “better”. While often hardware is indeed better, I do not agree that this hardware solution is better for anything other than a hack for personal use. Having to rely on RC timing for pulse lengths is a horrible idea, plus an added cost of less than a dollar is a large price to pay if you’re mass producing something. Better to add a low cost micro for repeatable results.

    1. I was just going to say this. Ben’s solution will work on a bench but good luck keeping it working long term, especially in places where there is vibration or temperature changes! A much better solution is to use a hw timer + interrupts with a small state machine in the interrupt handler to set it to generate the timing periods (which actually need to be quite accurate!).

      This SPI hack also doesn’t quite solve the originally described problem of using uint32_t for each LED (the LEDs need 24bit of data) in any way – you still need to store the data somewhere, regardless of whether they are being sent out using SPI or something else. His solution still uses 3bytes/pixel, effectively a byte array – the same thing that could be achieved using that timer above, with no additional external hw and with much better long term reliability (no pots that can change value or capacitors that age …)

      The Adafruit’s DMA thing is a gross hack, effectively abusing SPI to replay the pulse patterns instead of sending actual bytes out. It is convenient because the hardware takes care of everything when sending the data out but the data preparation is a pain and it requires a ton of RAM.

  3. I got nervous when I saw the R’s and C’s attached to 7400 TTL so I took a look at the 74LS123 design guide. I am not sure of the timing tolerances but device to device variations looked to be about -5 to +4% so roughly up to a 10% variance from chip to chip. The thermal stability as one would expect also was not really good. I noticed that HC parts were listed in the write up while LC parts were called for in the schematic.

    As a former designer of industrial electronics, this is the type of engineering that makes me very nervous. Someone gets one that works on their test bench and says “good” lets roll them out. A lot of them may be fine when tested. A lot of them may fail when tested with no parts out of spec. And a lot of them may fail if they are put into service with any large thermal variations,such as in a car.

    I am not saying this will not work, but this strikes me as the type of design where you will have issues The cap tolerance, parasitics, and chip to chip variation make me nervous.

      1. There is some tolerance in the required pulse lengths and the clock frequency. The length of the low phase is not important as long the clock frequency is not excessively high or so low that the latching is triggered. Only the description in the datasheet reads quite complicated.

  4. Meh. Passive components left out of the cost estimate. Pots and an “unusual value” caps. And PCB space.
    I’d bet you could get a $0.50 microcontroller to convert a known-frequncy SPI data line (one wire; save a pin on some micros) to NeoPixel datastream. Well, I’d bet that someone like AtomicZombie could do it…

      1. Because the larger processor can do DMA-based or buffered low-overhead SPI, *many* times more efficiently than it could bit-bang the NeoPixel datastream. Especially important for larger number of pixels.

  5. Also handy if you want to do “Ludicrous Speed” tm eg for a helicopter with Neopixels on the blade(s) or frame. With eye tracking so it adjusts speed and preferential illuminationdirection on the fly. Ha ha ha.

  6. Another “look I’m genius !” video… To show his “cleverness” the guy decided to improve on a library that presents itself as a SPI hack, with adjustable timings. He’s right to show that it has its inefficiencies, but it was probably included in Espressif’s IDE for compatibility with existing Arduino sketches.

    He’s complaining that his poor choice of library uses a lot of memory, with a demagogic “more than to the moon” exclamation that just sounds like bad political advertisement, but is litteraly quoted by HaD reporter, visibly happy with this kind of rethoric… and still, for his little ring of 16 RGB leds, that would mean less than two kilobytes out of the couple hundreds available on his shiny ESP32… like it’s 2019 :)

    Anyway these leds driven via high speed I2S with DMA on ESP8266 was done 3 years ago and is well known, and memory efficient. But then he wouldn’t have an excuse to pose as an electonician, showing his own massive stock of components and his $10.000 oscilloscope.

    1. “To show his “cleverness” the guy decided to improve on a library that presents itself as a SPI hack, with adjustable timings”

      Not quite. The library is using the RMT peripheral in the ESP32. That’s why it needs a uint32_t per bit (note to Ben, it’s *1* uint32_t, not *4* uint32_t’s – those are *bitfields*), because the RMT peripheral has each bit encoded as a uint32_t.

      But really, even that “memory” usage is fake, as it’s inside the RMT. The problem is using the generic HAL driver for the RMT. It’d be pretty easy to actually replace the straight data copy (which sadly is in three places, _rmtSendOnce, and the two ISR functions _rmt_tx_mem_first and _rmt_tx_mem_second) with another version which actually encodes the data from a straight bit representation of the signal to send. Or some hybrid of the two, obviously, where maybe you encode half the buffer at once and just let the ISR copy it or something.

      But, of course, as you mentioned that’s not the right way to do it anyway:

      “Anyway these leds driven via high speed I2S with DMA on ESP8266 was done 3 years ago and is well known, and memory efficient.”

      Should mention that it’s not just limited to ESP8266 devices, the NeoPixelBus ( https://github.com/Makuna/NeoPixelBus ) library has ESP32 support as well (using the Neo800KbpsMethod ).

      1. Sorry, I got these mixed up, thanks for your corrections and interesting clarifications about the RMT (Remote Control) peripheral.

        Obviously, if the I2S method was implemented 3 years ago on a merely documented ESP8266, I implied than doing the same with its more equiped and much better supported successor should probably be a piece of cake.

  7. It can be done even easier: Use 1/2 74HC123 (or a 74HC121) and no logic gates:
    Connect one (variable) resistor from VCC to pin R/C and a second one from SPI data to pin R/C of the same monoflop. That way you change the pulse length according to the data. Probably you have to invert the SPI data and the adjustment of the pulse lengths is not independent any more but you need less parts for the same functionality.

    1. I did exactly that for my own WS2812b strings. One 22V10 clocked from the microcontroller at 5x (can’t remember exactly) the bitrate of the SPI port. The design is a 4 bit shift register that either gets 1 bit set or 4 bits set depending on the state of the MOSI pin, on the falling edge of SCK. The output is taken at the end of the shift register, and that provides the needed timing. No messing with analog values and one shots, timing is as accurate as the crystal supplying the microcontroller clock.. The WS2812b controller chip is quite forgiving, and reading between the lines in the spec sheet shows there is a lot of margin on the pulse sizes. I’m pretty sure that inside the controller they use a one shot that is triggered on the rising edge of the pulse, it’s set to expire nominally for 1/2 the bit cell time. If the input is still ‘1’ when the one shot expires, the circuit calls the input bit a ‘1’, otherwise it calls ‘0’. There’s a second one that is also triggered (and re-triggered) on the rising edges, set for something less than 50us that creates the output register transfer pulse. They give a wide margin in the data sheet to allow for process and temperature variation on the timing of the decision point… almost 1/2 a bit cell time it seems.

  8. All these suggestions and workarounds to Ben Heck’s hackjob are great, but couldn’t one just use an ESP32 and some code to drive it? Probably less temp sensitive, and no RC to adjust. *drops mic* ;)

  9. If you need to drive WS28xx or Neopixel RGB LEDs and the controller in your project is limited. You can always do it with a SPLixel Basic which is a hardware LED driver that works with almost any development board with a serial port. Plus it uses almost no RAM in your main controller.

  10. It would be possible to combine the and/or logic gate ICs into a single 7400 Quad-NAND. The boolean expression (One AND SPI_Data) OR Zero, can be converted into (One NAND SPI_Data) NAND Zero. Therefore, a single quad-nand chip drops the part count shown above by one.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.