ESP32 Video Input Using I2S

Computer engineering student [sherwin-dc] had a rover project which required streaming video through an ESP32 to be accessed by a web server. He couldn’t find documentation for the standard camera interface of the ESP32, but even if he had it, that approach used too many I/O pins. Instead, [sherwin-dc] decided to shoe-horn a video into an I2S stream. It helped that he had access to an Altera MAX 10 FPGA to process the video signal from the camera. He did succeed, but it took a lot of experimenting to work around the limited resources of the ESP32. Ultimately [sherwin-dc] decided on QVGA resolution of 320×240 pixels, with 8 bits per pixel. This meant each frame uses just 77 KB of precious ESP32 RAM.

His design uses a 2.5 MHz SCK, which equates to about four frames per second. But he notes that with higher SCK rates in the tens of MHz, the frame rate could be significantly higher — in theory. But considering other system processing, the ESP32 can’t even keep up with four FPS. In the end, he was lucky to get 0.5 FPS throughput, but that was adequate for purposes of controlling the rover (see animated GIF below the break). That said, if you had a more powerful processor in your design, this technique might be of interest. [Sherwin-dc] notes that the standard camera drivers for the ESP32 use I2S under the hood, so the concept isn’t crazy.

We’ve covered several articles about generating video over I2S before, including this piece from back in 2019. Have you ever commandeered a protocol for “off-label” use?

slightly grainy quarter VGA image of the floor from the point of view of an autonomous rover. There are various balls on the floor as obstacles for the rover to navigate around.

5 thoughts on “ESP32 Video Input Using I2S

    1. Hi, Sherwin here. Thank you for your comment. I had considered using SPI as well, and though it would have been the go-to option on the ESP32, using it on the FPGA was not ideal.

      There was a software SPI library available on the FPGA but it would require interfacing with the ‘soft’ CPU on the FPGA, which has very limited processing power and would be busy with other tasks. A separate SPI IP block for the video pipeline was also available, but it streams video data in the Avalon Streaming format (this was an Altera FPGA) and would require further processing to separate video and control data packets. This block also inserts idle video packets when transferring SPI data and does not support backpressure (holding back the video pipeline in the FPGA for a frame to be transferred) which would cause more problems.

      I suppose creating a Verilog module to use with SPI is also possible, though I2S is a simpler protocol to implement. In short, I wanted the video streaming to require the least amount of processing from all 3 CPUs (FPGA, ESP32 and web server)

  1. Stupid question – would it be possible to emulate a SCCB camera with an FPGA such that one could plug it *directly* into an esp32-cam or K210 maixbit via the ribbon connector? I like the idea if taking HDMI input, downscaling it on FPGA and outputting it for processing, whilst buffering HDMI frames for later output maybe adding data via UART from esp32 or K210 – I read maixbit is supposed to do facial recognition in realtine for qvga@30fps.

    If this isn’t possible, any ideas or examples of using K210 FPIOA and repurposing the camera interface would be most welcome. I read they have yet to implement master8 spi in maixpy micropython, which sounds close to what I think would be needed to move data over the camera interface.

    Thanks :-)

    1. I believe its possible, but doing it from scratch would be extremely hard. The HDMI specification is very complex compared to VGA, and the same goes for SCCB compared to I2S.

      The easiest way to go about this is to use an FPGA with a HDMI input that automatically converts input video signals for use in a video processing pipeline. Though this would be relying mostly on proprietary IP blocks the FPGA supports. For example, the Altera FPGA in this project used has IP blocks for downscaling and buffering video frames, and can output them as a clocked signal where pixels are outputted one at a time.

      I doubt that an FPGA would directly support outputting video using the SCCB protocol, so most likely this would have to be done in Verilog which could be the hardest part. That being said, I can see some discussions online on how SCCB seems to be very similar to I2C, so it may be possible to use an IP block to output I2C signals (once again, depends on the FPGA, the one I used couldn’t do this through I2C directly but could if using the soft CPU), and if need be create a small Verilog module to ‘convert’ I2C into SCCB signals if needed. Disclaimer: I haven’t looked at either specification in detail, so I don’t know how feasible this would be. I haven’t use a K210 before either.

      Best of luck with your project and hope to read about it here one day!

      1. Thanks for the insight, much appreciated.

        I am using a mojo board for now, it has the HDMI in/out board and the example code allows for the use of SCCB to hdmi, so I feel that half the work is sort of done, if only I could reverse it…

        I am a total noob here, but I seem to have a knack for picking difficult things to ponder.

        I just like the idea of using the ML aspects of micropython on the K210 to do something interesting with day to day video feeds.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.