Rewriting WS2812 Driver Libraries For Optimization


We like [Tim’s] drive for improvement. He wrote a WS2812 driver library that works with AVR and ARM Cortex-M0 microcontrollers, but he wasn’t satisfied with how much of the controller’s resources the library used to simply output the required timing signal for these LED modules. When he set out to build version 2.0, he dug much deeper than just optimizing his own code.

We remember [Tim] from his project reverse engineering a candle flicker LED. This time, he’s done more reverse engineering by comparing the actual timing performance of the WS2812(B) module with its published specs. He learned that although several timing aspects require precision, others can be fudged a little bit. To figure out which ones, [Tim] used an ATtiny85 as a signal-generator and monitored performance results with a Saleae logic analyzer. Of course, to even talk about these advances you need to know something about the timing scheme, so [Tim] provides a quick run-through of the protocol as part of his write-up.

Click the top link to read his findings and how he used them to write the new library, which is stored in his GitHub repository.

14 thoughts on “Rewriting WS2812 Driver Libraries For Optimization

  1. Well, using a chip outside of its specs is not a good idea. While it works with its leds, nothing guarantees it will work with another batch of leds.

    Datasheet is not written for fun!

    1. Well following the datasheet is usually the first approach and should be sufficient. But if you have ever worked with these LEDs you may have noticed that this is not enough. There are several revisions of datasheets and products with different timings and sometimes contradicting specifications. The datasheets do not really explain which timing values are relevant and which are not and what has been changed between devices. Right now, world-semi has not even posted the datasheets on their english site.

      On top of that it is fairly easy to glitch these LEDs if the current supply is insufficient.

  2. I didn’t look closely at the timings, though it’s good to know that the chips will actually re-time their daisy-chain.

    Problem is, why are people still doing busy-loops for this? The timing windows are wide enough that you can *easily* generate the require waveforms by using a UART, which on most chips opens up the ability to use DMA, which makes actually *sending* the data nearly free. You still have to translate pixel data into bytes for the serial stream, and that costs memory, but if you’re only using a few LEDs that’s not likely to be a problem. If you’re using a whole mess of LEDs you’re going to want a dedicated driver anyway.

    /sigh – maybe I’ll just write up what I’ve already got working and post it today.

    1. You can do DMA assisted WS2812 control on many Cortex controllers. However, often at the expense of increased RAM usage, as you already point out.

      On AVR things are a bit different. There is no DMA. Even of you use SPI or PWM, you basically spend all your CPU time “busy waiting” for the periphery. In that case nothing is gained, but much is lost because you end up with increased code size and limited portability of the code between different devices.

      The only controller that really allows elegant implementation of the WS2812 protocol is the LPC800 series with SCT/SPI. Basically you can configure it to take the raw data and output the bitstream without any CPU intervention other than reloadting the SPI shift registers.

      See thread here:

      1. I use the AVR Xmega chips extensively, and they have all the support necessary for this, while being roughly conceptually compatible with conventional AVR. A key advantage that the Xmega has over the LPC800 is that you can get parts with USB in tiny packages (7x7mm QFN and 5×5 BGA-49).

        1. Agreed, on XMEGA you could use SPI and DMA with additional CPU preprocessing. That would probably free some of your CPU time compared to a CPU only solution. But then you could also get a CM3 with similar specs for the same price…

  3. Agreed, on XMEGA you could use SPI and DMA with additional CPU preprocessing. That would probably free some of your CPU time compared to a CPU only solution. But then you could also get a CM3 with similar specs for the same price…

    1. True, but then you’re off into the fun world of cobbling together a build environment that works. Between getting a toolchain installed, finding the device headers, getting a “startup.S” in place and all the rest, I’ve found ARM MCU development to be a massive PITA. With AVR/Xmega I just say gcc -mmcu=atxmega32a4u and I’m done. *Really* cannot understand why that hasn’t been done yet for ARM devel, and it’s on my (long) list of things to do eventually if somebody else doesn’t in the meantime.

      1. Maybe you should update your Atmel Studio. It handles ARM MCUs as simply as AVR ones.

        LPC-Xpresso is also a simple tool (for NXP of course). Codewarrior (for Freescale) is a bit less user-friendly (in my point of view).

        Also you can use Coocox or demo versions of Keil or IAR to be brand-independent.

        I also started a STM32F103 (and LPC2141 before) from scratch in a previous life…. never again :-D

  4. For those that have *thought* about the Propeller but couldn’t think of a good reason to give it a try, this is that opportunity. I wrote a WS2811/12 driver (assembly) in about 20 minutes — and it worked the first time. I’m not suggesting the Propeller is perfect for everything (no chip is), but in this case it’s really great; I’m able to devote a processor to running the WS2811/12 driver with no worries about how that driver’s behavior affectw the mainline code (because it doesn’t). I can even launch multiple copies of the same driver if I need (I did that with my WS2801 driver in the Wes Borland costume project that can be seen on the Hackaday projects site).

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.