TFT LCDs Hit Warp Speed With Teensy 3.1

[Paul Stoffregen], known as father of the Teensy, has leveraged the Teensy 3.1’s hardware to obtain some serious speed gains with SPI driven TFT LCDs. Low cost serial TFT LCDs have become commonplace these days. Many of us have used Adafruit’s TFT LCD library  to drive these displays on an Arduino. The Adafruit library gives us a simple API to work with these LCDs, and saves us from having to learn the intricacies of various driver chips.

[Paul] has turbocharged the library by using hardware available on Teensy 3.1’s 32 Freescale Kinetis K20 microcontroller. The first bump is raw speed. The Arduino’s ATmega328 can drive the SPI bus at 8MHz, while the Teensy’s Kinetis can ramp things up to 24MHz.

Speed isn’t everything though. [Paul] also used the Freescale’s 4 level FIFO to buffer transfers. By using a “Write first, then block until the FIFO isn’t full” algorithm, [Paul] ensured that new data always gets to the LCD as fast as possible.

Another huge bump was SPI chip select. The Kinetis can drive up to 5 SPI chip select pins from hardware. The ATmega328 doesn’t support chip selects. so they must be implemented with GPIO pins, which takes even more time.

The final result is rather impressive. Click past the break to see the ATmega based Arduno race against the Kinetis K20 powered Teensy 3.1.

Paul’s library is open source and available on Github.

Update: In response to some questions in the comments, [Paul] added a second video comparing the Teensy 3.1 to the Arduino Due. The Due running at a higher SPI clock speed, however the Teensy with its hardware advantages and optimized library still proved to be faster.

39 thoughts on “TFT LCDs Hit Warp Speed With Teensy 3.1

  1. Using GPIO for the chip select isn’t that much of an overhead on the AVR if you are accessing the GPIO register directly instead of relying on yet another layer of poorly written library code. Worse case is that you are adding a handful of clock cycles to SPI code.

    There is already DMA SPI for the Teensy/MK20. https://github.com/crteensy/DmaSpi
    Ironically because of the brain dead way Freescale set up their register map, one would have to use GPIO to access chip select there.

    1. You only need to assert the active low chip select before sending SPI and deassert it after a block of data. You don’t need to toggle it every single byte of data transfer. The amount of overhead is not that bad when you compare to the 2 clock cycle minimum bit time (16 clock cycle per byte)+ overhead and amount of overhead of a function call.

      1. I can only tell you what the SPI in K22 core does (Teensy has K20). The SPI hardware is programmable for that.
        If CONT bit = 0, the /PCS (chip select pin) is toggled after every word transfer (# of bits settable). If CONT bit =1, the /PCS remain asserted.
        (User must fill the TX FIFO with the number of entries that will be concatenated together under one PCS assertion for both master and slave before the TX FIFO becomes empty.)
        The brain dead part of the Freescale SPI has to do with the fact that they insist on pairing the data with command at the upper 16-bit word. You have to interleave each word transfer with the command how it is supposed to be transfered for pushing into the FIFO. That essentially render the DMA features less useful (without doing a lot of extra work.)
        For the type of applications that I find myself using, I rarely need to toggle the chip select lines except at the block level. If you don’t need that, then you can manually toggle the chip select line at only the beginning and end of a block, but let the DMA engine read/write the data in the lower 16-bit/8-bit. That’s what the DMASpi library does.

      1. These SPI-interface TFT displays actually use 5 signals: the normal 4 for SPI plus one extra “address” pin, which tells the display which bytes are commands and which are data.

        One of the speedups involves treating both of these are chip select signals, generated by the special chip select hardware. It’s all described in the lengthy blog article I wrote. I know it’s a long read and pretty technical, but I do hope it’ll help raise some awareness of how this newer generation of more sophisticated hardware can be used to massively increase overall speed, far beyond the moderate speed increases possible by only increasing clock speeds.

    1. “Lacks integrity” is rather strong. Paul isn’t slamming the Arduino here. 8 MHz is pretty darn fast, even with the dead periods shown on the scope. You can do a heck of a lot of stuff with a 328 based Arduino over SPI. He’s showing how much faster you can go with a Teensy 3.1 in this particular case.

    2. Hi, Paul here. It was never my intention to mislead you or anyone else. I’m a bit disturbed you feel this way, but I do see your point. Teensy 3.1 is a 32 bit ARM chip and Arduino Uno is an 8 bit AVR. Due has a 32 bit ARM chip approximately equal in specs to the chip on Teensy 3.1.

      So just now, I’ve quickly made another video, using Arduino Due and Teensy 3.1. Due is running Adafruit’s not-very-optimized library, which uses Due’s not-very-optimized SPI library. But to put Due on equal footing in terms of clock speeds, I edited Adafruit’s library to configure Due with 28 MHz clock speed, a bit faster than the 24 MHz SPI clock Teensy 3.1 uses.

      https://www.youtube.com/watch?v=rL_2_D3cgFg

      I hope this second video helps restore your confidence in my credibility. ;-)

      I also hope you’ll take a few minutes to read the detailed blog article I wrote, because the real point of this work is how to optimize software to more fully leverage fast hardware. Simply running on faster hardware, but using the same simple code that doesn’t take advantage of FIFOs, automatic chip select and other features, has only a modest speed increase.

      1. Thank you, (I actually just bought a Teensy 3.1 so that may have racheted up the wording of my response more than it should have; I apologize for that). Your new video really helps highlight the difference of SPI hardware acceleration and code acceleration. And FWIW it’s refreshing to know there are people not relying on Moore’s law for fast performance.

        1. Neither Due nor any AVR chips have a SPI FIFO, so the most important optimization is simply impossible on Arduino Uno & Due.

          But some of these techniques could be ported to Due, and some of the minor optimizations could be done on any board, but perhaps at a cost of code space (which is in short supply on the ‘328). All the code is open source and easily available on GitHub. When/if you or anyone else ever attempts this, I hope you’ll shoot a quick video demo to show the speedup, and of course I hope you’ll compare side-by-side to my highly optimized ILI9341_t3 library.

      2. Thanks for dropping in Paul. This video really brings home the fact that it’s not raw clock speed that is making the difference here. It’s the FIFO, and the optimizations to the library. I’ve added it to the post as an update.

  2. If driving a TFT with an SPI interface is “warp speed”

    I suppose Andy Brown’s work using the parallel interface and an FPGA is “beyond warp speed”

    Or maybe hackaday has run out of adjectives

  3. Well, I guess I now have another compelling reason to buy a Teensy. One more dev board to add to the pile. I’ve been reading Paul’s blog for a while, and it’s really impressive work.

  4. Fantastic :) I had no idea that spi could be that fast. How does it compare to arduino 8-bit transfers? :D
    Surely the adafruit shield has touch and the pjrc doesn’t. But is it because of the lack of touch controller? If so, the seeed TFT shield shows that it can be done with the micro’s analog inputs measuring x1,x2,y1,y2.
    Also, would it be able to chip-select several different things, like sd card, audio shield and cc3000 wifi? :D

    1. Wow, that’s an impressive list of questions for such a short message. Here’s quick answers:

      Some of Adafruit’s TFTs have touch, some don’t. They sell many different models.

      This test used a (very cheap) non-Adafruit TFT, without touch. It happens to have the same controller chip, so it works with Adafruit’s original library and my optimized version, but it’s not an Adafruit product.

      No, touch is separate.

      Yes, the SPI bus can be shared with other devices. Sometimes it works well, other times there are hidden gotchas.

      Recently I’ve worked on improvements to the Arduino SPI library (to become part of Arduino 1.5.8), which HaD covered a few weeks ago, regarding improved sharing between certain types of SPI devices that today cause conflicts. You can find details on my DorkbotPDX blog, if you missed the article.

      These HaD comments aren’t a great place to talk about how to design projects. If you’re using Teensy, forum.pjrc.com would be best. If using Adafruit products, forums.adafruit.com is great, and if using official Arduino, of course post on Arduino forum.

    2. Touch doesnt play into this demo at all and makes no difference in this comparison.
      Chip selects are for the exact purpose you mention, SD card, audio and wifi at the same time are indeed possible. The devices would all share their MOSI, MISO and SCK signals but have seperate chip selects. Depending on which chip select is strobed a different device is selected, the others will simply ignore the data.
      This means the maximum amount of devices you can have on an SPI bus is limited only by how many chip select signals you can drive, on the teensy it actually has 5 hardware chip selects so when calling to the SPI module to do things you can specify a device and it will automatically assert the chip select line for you. The AVR on an arduino doesnt have hardware chip selects, you have to manually drive an IO pin to simulate the chip select.

      You’ll notice the SD card library when you create your SD card object requires you to define a pin to act as the chip select. The ethernet library does the same, the cc3000 library does the same, I don’t have an audio shield but I would assume it does the same. Just use a different pin for each and you can use all 3 devices.

      1. I wish SPI sharing could be this simple. I’ve been working on improvements so it someday can be.

        CC3000 is troublesome, partly because it uses SPI_MODE1, partly because it uses the SPI port from within an interrupt. Adafruit’s CC3000 library has code to backup the AVR’s SPI registers, change them to MODE1, and then restore when it’s done, so the conflicting clock polarity isn’t (usually) an issue on AVR. But on Due, Teensy 3.1 and all other non-AVR chips, their specific SPI registers aren’t also in the code, so you can pretty easily end up with the SPI port left in the wrong mode.

        Interrupts are also a huge problem. Using the CC3000 in simple blocking ways, where you fully complete all communication before you try to write to the display or read the touch screen or access the SD card tends to work. But if you use another device while the CC3000 generates an interrupt at just the wrong moment, it can run its SPI code while another device has chip select asserted, causing all sorts of terribly wrong results.

        The touch controller on Adafruit’s displays comes in a couple different types, which need different SPI data modes, and some require very slow clock speeds. Again, they have AVR-only register save/restore, so usually you don’t get wrong settings into other libraries, but there’s no hardware specific code in those libs for non-AVR chips.

        My recent work on SPI transactions, which will be in Teensyduino 1.20 (already in the latest release candidate and on github) and is planned for Arduino 1.5.8 (already in their github source and nightly builds) aims to solve both the settings and interrupt problems, in a hardware independent way. Adafruit has already merged my patches to their libs, at least for these most common ones, so they use the new SPI transaction stuff when compiled on those new versions.

        I wish that were the end of all SPI sharing troubles, but it isn’t.

        Early versions of Adafruit’s CC3000 breakout, and the one Sparkfun still sells, and probably others, lack a tri-state buffer on their MISO signal. If you try to use one of those boards together with any other SPI device, the CC3000 will continue driving its MISO signal, even after you de-assert its chip select.

        Likewise, the W5100 etherchip lacks a tri-state buffer. Modern versions of the Arduino Ethernet Shield have a tri-state buffer chip, but older versions do not.

        Even if all your SPI chips disable their MISO driver, another common problem involves lack of pullup resistors on the chip select signals. Often 2 pieces of hardware that each work in isolation and should be able to share the SPI bus fail to work together, because the second has its chip select line floating low while the first is configured. Usually this can be solved by adding pinMode(cspin, INPUT_PULLUP) at the beginning of your setup() function… if you’re aware of this problem. But if you just buy modules, breakout boards or shields and wire them together hoping they’ll work, and each does when used alone, but they fail when put together, usually there’s little you can do to resolve the problem (if you haven’t read a lengthy post like this one).

        My point is many problems do occur with SPI sharing, even when people have properly assigned a chip select pin to each device. My hope is to eventually patch most of the widely used libraries for SPI transactions after 1.5.8 releases, so some of the worst problems go away automatically. For the rest, I hope to raise awareness of these issues, so when people answer questions about SPI sharing, they can advise people about these real-world issues that are just as important as unique chip select pins.

        1. I forgot to mention one other SPI sharing headache… as if all the stuff in that lengthy message wasn’t enough!

          Some SD cards requires up to 8 extra clocks on SCK before they actually process the last command or de-assert their MISO signal. Only some cards need this, others do not. Bill’s latest SdFat library does these extra clocks, but the old version inside Arduino’s SD library does not. I’m planning to patch it soon.

          There’s also a known issue with the ADS129X chips used on many of the EEG projects, where they de-assert their interrupt signal in response to SCK, even if their chip select is de-asserted. So far, I’m not aware of anyone publishing a really good interrupt-based library for these chips, but it’s on my radar of SPI sharing troubles, since they’re becoming more popular lately.

  5. This is what I have been using the Teensy 3.1 for as well, except I am still using the ST7735 LCD. It has less pixels to drive an no touch, but works well for what I want. When I first started my project, it was on a Uno SMD. Then I moved to a Teensy and then Teensy 2. I would get about 3 to 7 FPS due to the SPI rate. I saw some amazing performance on the NetDuino and was thinking of switching to it.

    Then the Teensy 3 came and a combination of Paul and Peter Lovedays libraries for SPI made driving an LCD awesome. I now can get a solid flicker free 40 FPS with 16 active sprites.

    Here is a demo of my project: https://www.youtube.com/watch?v=Gpzi1fXzfyY&list=UUIMDN1Xigie1yKpgx9ksh3Q

    I am now eagerly watching Paul’s work on better SPI sharing.

  6. Did you do a speed test with loading and displaying bitmaps from an SD card mounted on the back of the tft? I have a sketch for an Arduino UNo and the refresh rate is pretty slow

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.