Blazing Fast Raspberry Pi Display Driver Will Melt Your Face then Teach You How

Reader [poipoi] recently wrote into our tip line to tell us about an “amazingly fast” Raspberry Pi display driver with a README file that “is an actual joy to read”. Of course, we had to see for ourselves. The fbcp-ili9341 repo, by [juj], seems to live up to the hype! The software itself appears impressive, and the README is detailed, well-structured, educational, and dare we say entertaining?

The driver’s main goal is to produce high frame rates — up to around 60 frames per second — over an SPI bus, and it runs on various Raspberry Pi devices including the 2, 3 and Zero W. Any video output that goes to the Pi’s HDMI port will be mirrored to a TFT display over the SPI bus. It works with many of the popular displays currently out there, including those that use the ILI9341, ILI9340, and HX8357D chipsets.

The techniques that let [juj] coax such frame rates out of a not-terribly-fast serial bus are explained in detail in the README’s How it Works section, but much of it boils down to the fact that it’s only sending changed pixels for each frame, instead of the full screen. This cuts out the transmission of about 50% of the pixels in each update when you’re playing a game like Quake, claims the author. There are other interesting performance tweaks as well, so be sure to check out the repo for all the details.

There’s a video comparing the performance of fbcp-ili9341 to mainline SPI drivers after the break.

We’ve covered similarly performance-focused SPI display drivers for the esp8266, esp32, and teensy, if you’re looking to use a more lightweight computing platform.

[Thanks again for the tip poipoi]

19 thoughts on “Blazing Fast Raspberry Pi Display Driver Will Melt Your Face then Teach You How

    1. Found this in the “how it works”: “Good old interlacing is added into the mix: if the amount of pixels that needs updating is detected to be too much that the SPI bus cannot handle it, the driver adaptively resorts to doing an interlaced update, uploading even and odd scanlines at subsequent frames.”

          1. block formats are pretty easy to do but i dont think there is any support for that in the display hardware. like being able to use dxt1 at four bits per pixel, a 4x improvement over raw 16 bit color data. for every 4×4 block of cells, select the lightest and darkest 2 pixels in the block, sticking those into a color table, generate 2 interpolated pixels. those four colors become a color table and the pixels are stored as 2-bit indices. decompression is much faster as it just needs to do the interpolation step and from that it can assign color values to the pixels on the screen so the demands on the display controller would be minimal. it just has to exist first.

  1. Nice, this has always annoyed me with the portable rpi builds. I wonder how hard it would be to add touchscreen support — hopefully it can mostly be borrowed from another driver.

    1. Not hard. You can find plenty of 4- or 5-wire resistive touchscreens of the right size on the usual Chinese dealers. Snap a USB controller to it, and it´s directly recognized and usable by Xorg.

  2. It seems possible to use the SPI bus on ILI9341 @52 fps (full-frame, no partial write) with a simple overclocked STM32f103 (the great 2$ “blue pill”). So it seems there is room for optimization, and even higher framerate can be attained while using the RPi, since it can output SPI at a max speed of 125MHz:

    1. The Pi might do such high clock rates, but the display probably won’t. In my experience they crap out any much higher than 40MHz (although that could well be my setup because I can’t afford a scope that can see anything that high), and the ILI9341 datasheet only specs them at 10MHz. YMMV

  3. yes improvements can be made to how spi displays data. in several ways. spi bursting and sustain it by doing simple work while spi data is being sent, buffering display data so only what changes is updated 2x-5x performance and also interleaving and only processing odd or even line data, on larger displays there is diffusion enough that 1 line space can hardly be visible.

    there is an other way that could make this project work even better, and that is data bandwidth compression or regulation. this is easy to implement and what it does is look for the range of changed data so that only the most changes gets updated to display, with a redundant update every 1/8 or 1/16 for all areas to refresh. you set a number +/- and bring it lower or higher to make the distribution area larger or smaller, if to much data is processed, 1-2 frames will be slower, but the rest will speed up as distribution band becomes narrower, or wider if less data is flowing.

    I do it on Arduino without arm processors and get reasonably high refresh rates. my code however is not compatible with arm format currently, because i have not differentiated the spi timings and some performance is lost if implementing the check spi ready loops. things do become different on 72mhz or faster machines, with dma, and super fast spi speeds.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.