SIMD-Accelerated Computer Vision On The ESP32-S3

July 1, 2024

One of the fun parts of the ESP32-S3 microcontroller is that it got upgraded to the newer Cadence Xtensa LX7 processor core, which turns out to have a range of SIMD instructions that can help to significantly speed up a range of tasks. [Shranav Palakurthi] recently used this to speed up the processing of video frames to detect corners using the FAST method. By moving some operations that benefit from SIMD over to an optimized version written in LX7 ASM, the algorithm’s throughput was increased by 220%, from 5.1 MP/s to 11.2 MP/s, albeit with some caveats.

The problem with the SIMD instructions in the LX7 other than them being very poorly documented – unless you sign an NDA with Cadence – is that it misses many instructions that would be really useful. For [Shranav] the lack of support for direct misaligned reads and comparing of unsigned 8-bit numbers were hurdles, but could be worked around, with the results available on GitHub.

Much of the groundwork for this SIMD implementation was laid by [Larry Bank], who reverse-engineered the SIMD instructions from available documentation and code samples, finding that the ESP32-S3 misses quite a few common SIMD instructions, including various shifts and unaligned reads and writes. Still, it’s good enough for quite a few tasks, as long as you can make it work with the available instructions.

11 thoughts on “SIMD-Accelerated Computer Vision On The ESP32-S3”

Simon Masters says:

July 2, 2024 at 12:46 am

For someone interested in ESP_32 SHA256 acceleration this article promises a lot but says very little. The article contains no immediately useful information or clear links to actual implementations

Report comment

Reply
1. Simon Masters says:
  
  July 2, 2024 at 1:20 am
  
  Yes I have read the manual, but I am not an assembly programmer
  
  Report comment
  
  Reply
  1. William Payne says:
    
    July 2, 2024 at 3:16 pm
    
    Assembler is a platform-specific obsolete software technology.
    
    1 Write a gcc c program which calls your machine language subprogram.
    2 Use gcc c compiler as much as possible to try to do what your machine-specific code need to so.
    3 Look at the gcc c code disassembly.
    4 Modify 3 as needed to access your platform-specific machine code.
    5 Do this in a char array.
    6 write the machine code char array to a file.
    7 In your gcc c main program read the machine code into a char array.
    8 Read gcc c labels as values.
    9 Then do an indirect jump into your machine code.
    10 You must pass argument references from you c program to you machine code. Example: a = b+c. b and c argument must to passed to you machine code.
    11 You machine code must extract the argument values and place them into plaftfor registers.
    12 Issue you platform instruction.
    13 Place the instruction return onto the argument stack … modifying TOS, of course.
    14 You gcc c program must place the c return address [supplied by the label value]
    15 Do an indirect jump to the return stack value … decrement the return stack TOS pointer.
    16 And hope you see the correct answer in you gcc c calling program, of course.
    
    Report comment
    
    Reply
Sprite_tm says:

July 2, 2024 at 1:46 am

Note that the SIMD in the ESP32-S3 is not a Cadence thing but an Espressif thing: that is why the entire instruction set is documented in the ESP32S3 TRM. You could make the point that it could benefit up with some examples on how to use it – as the author rightfully mentioned the current thing is to look at esp-dsp.

Report comment

Reply
1. alialiali says:
  
  July 2, 2024 at 9:53 am
  
  Increased by 120% or maybe by x2.2 I think the multiplayer is the preferred language.
  
  Report comment
  
  Reply
sweethack says:

July 2, 2024 at 2:09 am

Looking at the [code](https://github.com/shraiwi/simd-fast-esp32s3/blob/232008ee45abe622d1f9a61943f2cf3270b33c41/lib/simd_fast/simd_fast.c#L308) it seems the author doesn't know about && operator in C/C++ nor about cyclomatic complexity. Doesn't really impress me the C code is running so slow. Or maybe it's because the SIMD functions are all called "simd_fast_something". I guess the compiler does a better job when the functions' name contains "fast". It must think that if the user called the function "fast" it should be faster than the rest of the code.

Report comment

Reply
1. jpiat says:
  
  July 2, 2024 at 6:59 am
  
  FAST is an acronym and the code you see is the original C code of the FAST detector not code the author wrote by himself
  
  Report comment
  
  Reply
  1. C. Scott Ananian says:
    
    July 2, 2024 at 12:40 pm
    
    Also almost certainly machine-generated code, presumably from some specialized code generator for that particular algorithm.
    
    Report comment
    
    Reply
    1. sweethack says:
      
      July 3, 2024 at 3:44 am
      
      In that case, the generator is bullshit. I can understand that making truth table for numerous boolean condition is painful for a human, but it’s dumb simple for an algorithm. The whole function above can be converted from probably 1700 LOC to only 30 LOC by any human programmer (and probably less via condition optimizations). Geez, maybe even compiling that stuff with a C compiler in -O3 and decompiling the result back to C would give better and human readable results!
      
      Report comment
      
      Reply
    2. Travis says:
      
      December 8, 2024 at 8:33 am
      
      All true facts. That said, the code does look absolutely hilarious, lol. What a crazy mess of conditional nesting! It’ll have to have a ton still, but yeah could be greatly reduced…if it actually mattered. When code is template generated, manually modifying it isn’t usually part of the workflow.
      
      Report comment
      
      Reply
      1. Travis says:
        
        December 8, 2024 at 8:34 am
        
        Holy crap I got recommended an old article, my bad!
        
        Report comment

Hackaday

SIMD-Accelerated Computer Vision On The ESP32-S3

11 thoughts on “SIMD-Accelerated Computer Vision On The ESP32-S3”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

VRML And The Dream Of Bringing 3D To The World Wide Web

Australia’s Space Program Finally Gets Off The Pad, But Only Barely

What Happens When Lightning Strikes A Plane?

Happy Birthday 6502

Two For The Price Of One: BornHack 2024 And 2025 Badges

Our Columns

Hackaday Links: August 10, 2025

A Love Letter To Prototype Zero

Hackaday Podcast Episode 332: 5 Axes Are Better Than 3, Hacking Your Behavior, And The Man Who Made Models

This Week In Security: Perplexity V Cloudflare, GreedyBear, And HashiCorp

The 64-Degree Egg, And Other Delicious Variants

11 thoughts on “SIMD-Accelerated Computer Vision On The ESP32-S3”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns