Running 57 Threads At Once On The Arduino Uno

When one thinks of the Arduino Uno, one thinks of a capable 8-bit microcontroller platform that nonetheless doesn’t set the world alight with its performance. Unlike more modern parts like the ESP32, it has just a single core and no real multitasking abilities. But what if one wanted to run many threads on an Uno all at once? [Adam] whipped up some code to do just that.

Threads are useful for when you have multiple jobs that need to be done at the same time without interfering with each other. The magic of [Adam]’s ThreadHandler library is that it’s designed to run many threads and do so in real time, with priority management as well. On the Arduino Uno, certainly no speed demon, it can run up to 57 threads concurrently at 6ms intervals with a minumum timing error of 556 µs and a maximum of 952 µs. With a more reasonable number of 7 threads, the minimum error drops to just 120 µs.  Each thread comes with an estimated overhead of 1.3% CPU load and 26 bytes of RAM usage.

While we struggle to think of what we could do with more than a handful of threads on an Arduino Uno, we’re sure you might have some ideas – sound off in the comments. ThreadHandler is available for your perusal here, and runs on SAMD21 boards as well as any AVR-based boards that are compatible with TimerOne. We’ve seen other work in the same space before, such as ChibiOS for the Arduino platform. Video after the break.

44 thoughts on “Running 57 Threads At Once On The Arduino Uno

    1. Finally moving away from AVR architecture. It’s about time given how the price has stayed absurd and the speed and memory of basically every processor has also stayed low.

      Would be amusing to see how this type of use extends to what seem like upcoming (eventually) RP2742 and up style Raspberry Pi microcontroller devices which already literally have two processors for their microcontrollers and actual memory (up to 16 MB already!) and decent CPU speeds and actually not too bad 40 nanometer CPU manufacturing as well.

      Multiple PIO microcontrollers with decent memory available? Yes, please! I wonder how many total threads you can get when your microcontrollers run at 800 MHz or more per processor with potentially 4 or even more processors per chip and everything being actually reasonably priced, robust, supports anything from (some but quite robust) low level assembly, direct C/C++ as well as more higher level MicroPython and CircuitPython (basically Python but for microcontrollers) as well?

      https://www.cnx-software.com/wp-content/uploads/2021/01/Raspberry-Pi-MCU-Nomenclature.jpg

  1. I despise and thumb my nose at the AVR architecture every change I get, but ….

    It is all about what the threads are supposed to do. I am not sure what the magic “57” is about, but the story with any RTOS with threads is that the threads are all normally blocked waiting for their purpose in life to happen. So it is more a matter of the size of the table to handle them than the compute horsepower of the host if they are all (as they should be) sitting there blocked waiting for their cue to come on stage.

      1. Not OP, but I have a few beef with the CPU core.
        – I like unified memory model e.g. Arm, STM8. I/O is memory mapped. So Any memory can be used for code or data without requiring special instructions for data in code space, I/O instruction. i.e. You can download code and run directly in RAM. You don’t need stupid macro in compiler to access data in code space.
        – Silly thing like “fuses” to set clock source that can’t be controlled in firmware at run time. STM8 has fuses, but they can be reprogrammed from firmware.
        – Atmel didn’t have hardware debugger at first. They still haven’t publish the protocol for chips that have it.

        AVR peripherals hasn’t keep up
        – Even chinese clone have 12-bit ADC these days. :P

        Old chip process. So you pay a lot more for memory and peripherals than Arm chips.

        1. This! Each and every point you’ve mentioned except for the peripherals is a reson why the AVR architecture is so, well, hardly usable. Taking into an account that nowdays it is also pricier by any reasonable metric compared to ARM (I’m giving ARM just as an example of an architecture I’m familiar with) there’s no reason whatsoever to use it in new designs even for onetime hobby projects.

          1. My experience with STM Discovery board: 80 hours (no joke – several hours every day for 2 weeks) of setting up drivers and finally I don’t know why it was not working. I had IDE working under Windows (that was the only reason to come back to windows on that time), no IDE working under Linux.
            My experience with Arduino: Everything running after 5 minutes under Linux (and exactly same IDE on Windows if I need).

            Sure all this debugging stuff are cool and eventually essential but if there is no start there will be no need for debugging at all. This is why Raspberry Pi and Arduino (with it’s “IDE”) are still competing despite many “killers” showing up every season.

          2. @PPJ: I don’t use C/C++ and don’t like automagic and haven’t ever needed to do anything with some drivers whatsoever. (What drivers are you talking about?)

            Having said that, if you want to do some Arduino-like stuff, just install VSCode, install PlatformIO from its Marketplace with one click. Create new project -> Choose your board -> Choose your starting example/framework -> Click PLAY button to run it immediately.
            Now you have very fast, almost notepad-fast fullblown IDE with syntax highlighting, indexing, autoformatting, debugging etc etc…

          3. @PPJ: For about 2x the time, I have learnt Arm from scratch and ported/customized a RTOS that previously haven’t supported that chip series. That’s all thanks to having hardware debugger as it helps a lot to figure where things have crashed.

            In life, there are travellers and tourists. I prefer to exploring new things and not do exactly the same old sh*t as everyone else.

        2. How about selecting the right part for the job? I don’t know how many times I’ve seen the ‘could have used a 555, for that’. And sometimes, I could actually see it done. There are also a lot of people, who like to push a part, for all it’s worth, and more, just as a challenge. Some people just have a favorite part, they are most familiar, and takes little thought or effort to use. Others, want to sell the latest, and greatest, sort of bragging about their wealth, and intelligence. To be honest, most projects only actually use a small fraction of what a microcontroller is capable of doing. We only use the features we actually need, to get a job done, with plenty left over, unused. AVR has been around for a very long time, many examples of using them for most anything. Pointless to have to start entirely from scratch, every time a new part is released. Older microcontrollers will have a lot of use and value, for a long time.

        3. I’m confused what you mean by no hardware debugger. I was pretty sure I had used gdb with avr chips, and definitely know I’ve used it with atmel studio, both with the 328 and the xmega256a3bu which is a beast of an avr.

        4. *Older* Mega8 and smaller ones has no hardware debugger and only have it after the mega16/32. You obviously hasn’t been around before Arduino days. There are also those larger ones with JTAG.

          Are there any open source AVR hardware debugger dongles!? Atmel’s hardware debugger is closed source. There are software that talks to official debugger.

    1. Apparently each thread uses 26 bytes , and there’s 2048 bytes of SRAM ; 26*57=1482, so that’s probably what limits it. Of course the actual thread code has to fit somewhere too,

      1. The thread code won’t use any RAM itself, since it runs from flash memory. But there’s got to be memory somewhere for the global stack and whatever global variables the task manager might need. Plus there’s the heap for memory allocation.

        However, one would think that each thread would have to have its own stack or you couldn’t have pre-emptive task switching without severely limiting what each thread is able to do, so that brings into question that “26 bytes per thread” number.

        Then there are the 32 16-bit general purpose registers (64 bytes), some if not all of which would need to be saved and reloaded during a thread switch – again, without severely limiting what each thread is able to do. I don’t know what the AVR compilers do specifically, but most compilers will put as many local variables as possible into general-purpose registers in order to improve operation (both execution speed and code size reduction, although in the AVR architecture the former might not count for much) – either that or they go on the stack. The registers could be saved to the thread stack, but to save them all would be a minimum of 64 bytes. Plus the stack pointer, plus the program counter and you’re up to 72 bytes minimum (*57 = 4104).

        Something’s got to give here, because that doesn’t fit.

        1. There is no stack for each “thread”, only a single stack. As I understand it, the whole system is effectively the same as nested interrupts, where each “thread” is called from the timer ISR function. Threads must return, otherwise they block lower-priority threads (therefore blocking functions require a “thread” to be made up of multiple functions). Have a look at the readme on the git repo

        2. W.r.t. your memory requirements comment, you only need enough stack to save the registers once for each priority level running concurrently, plus whatever local variables are used in those concurrent “threads”

    2. The AVR has the really nice feature of being super standard and having clones that are usually good enough. They’re very simple and rugged.

      What I really don’t like is overuse of low end ARM. If you
      need that level of power, your application could probably benefit from connectivity, so why not go for ESP?

      1. Which AVR clones?

        Many reasons to choose an cortex m over the Esp32. Power consumption, huge selection of peripherals, multiple vendors… And having a recent GCC available is another advantage.

        Connectivity doesnt always mean wifi.

          1. Actually the USB is *not* on the ESP32-S2 module. Might want to look very closely at the tin can portion of the PCB.

            It is on the ESP32-S2 DevKits – a separate chip on a breakout PCB that handles the hardware debugger. It is not controllable from the ESP32-S2 module side and only from an external USB host.

      2. Low end Arm chip has price, speed, memory and peripheral advantage. What’s not to be liked?
        Unlike the ESP, there are also decent development envirnoment and decent hardware debugger support in the available IDE.

  2. This sounds like a great application for the Controllino to use with safety. Constantly checking the safety circuit and when triggered interrupts main loop to actually shut machine down…

    1. Safety critical features seem much better suited to an actual interrupt. Timer-based interrupts for checking state or hardware interrupts directly off a comparator or some such, which could call a safe-shutdown function if necessary. There’s a lot of room for introducing error in a thread-based safety loop, and it would likely take more development and take up more overhead.

    2. Seems to me like a good purpose for these threads could be like I/O sampling loops in industrial control where you want to read an input and make a control output decision. Kind of like how you would use a PLC. First thing that came to my mind. A safety interrupt could be part of that if the timing is reasonable. For example, a temperature limit hit a couple times a second is probably good enough. An operator e-stop circuit probably needs to be a hard interrupt since someone pushed the “oh, no” button.

      1. Number of threads you can manage also has a lot to do with how tight the loops is. In industrial control say I am reading a temperature and turning a heater on and off. That is very tight deterministic code. If you are sitting around waiting for an asychronous event, that is a bigger problem since you can’t determine how long that thread will run.

        The architecture needs to have enough memory to store your thread and the thread switching requires time and processing power as well. A lot of devices can be made to handle threaded applications but certain devices are hardware optimized to deal with certain numbers of threads.

    3. That’s not going fly on an actual safety application. Microchip does have microcontrollers with certified safety functionality, but you are going to have to use their special compilers to program that.

  3. I don’t understand where this attitude comes from: “Unlike more modern parts like the ESP32, it has just a single core and no real multitasking abilities.”

    I remember the days when we did multitasking on Macintosh System 7 without multiple cores, without protected memory between processes, and without preemptive multitasking driven by a timer (you had to call SystemTask() or GetNextEvent() to give other processes a turn.) Why do people think you need multiple cores to do multitasking today? All you need is to understand your stack layout and your processor’s register saving convention when servicing an interrupt. (And *maybe* you want a clock interrupt to do preemption.)

    Implementing multithreading on a 68HC11 microcontroller was a popular instructional exercise 20 years ago when I was in college. (I haven’t done it myself, though.)

  4. i’m sure it’s an elegant hack but when you’re writing for a microcontroller i really think all these things should be thought of as what they are, rather than abstracted into something big-computer-sounding. each thread should be an ISR and maybe a bottom half (a portion that operating in the main loop, draining a queue or state machine triggered by the ISR). you should think of the intersection between the requirements of your design and the abilities of the processor in question.

    and on the flipside, when you’re writing for a big computer (like raspberry pi or whatever), you shouldn’t be messing around with these timing-sensitive things, they should be offloaded entirely to a peripheral or co-processor.

    if you want to use a general purpose processor as a I/O co-processor, propeller is the architecture that mastered that. tacking multithreading onto an AVR isn’t going to accomplish any of these things very well.

    anyways that’s just my philosophy

    1. It’s not a “hack”, it’s a piece of engineering.

      And proper RTOSes are vital for many embedded applications. A trivial interrupt handler setup is not going to keep up when you need processes running on differing timescales and priorities, with complex locking interactions between them, and needing to hit timing guarantees.

      Sure, you can write a giant state machine by hand and spend all your time debugging subtle edge cases and failures, but who voluntarily does that?

  5. Here’s an application: NTP server

    In my implementation there are two functions. When available, read the time from the GPS module and set the Arduino time with it, When there is data on the ethernet port, check that it is a time request and return the Arduino time. Essentially two threads. Right now I have two calls in “loop” Yeah, they could be an ISR, but you asked for an application

    1. I wrote a much less capable version of something similar, that was a tick-based scheduler with priority. I did it for a stepper motor driver that needed very regular ticks for the stepper, but also needed to poll several user pushbuttons and display some stuff to an lcd. It works really well as a single axis driver for my lathe feed screw. It’s a decent use for an atmega.

    1. There’s no effective difference – this implementation just puts the infinite loop outside the thread’s run() function, rather than expecting the thread to do it manually.

    2. They are just a bunch of functions called from a timer interrupt. You cannot have concurrent infinite loops running without breaking the loop into multiple functions (aka “code blocks” in this library), or adding your own state management (to remember where you were up to before returning/yielding), as a higher-priority function must return before a lower-priority function can continue.

  6. It’s a case of the right tool for the job.

    In 1996 when the AVR first appeared, it was one of the first microcontrollers to use flash memory. Before that it had been UV erasable, mask programmable or OTP one time programmable.

    You cannot exactly compare an 8-bit microcontroller intended for consumer appliances with a 32-bit ARM. The ARM really only reached the mass market after it had appeared in Nokia cell phones and PDAs.

    I agree with @PPJ – the initial learning curve with the ARM was steep. Back in 2013, when I took on my first STM32 Cortex M4 project, you had to hunt around for no-cost tools (CooCox etc) and then it would take you a week to get your first LED to blink. After that, it got much easier.

    As someone who is a hobbyist programmer, the Arduino ecosystem has given me access to a vast range of new microcontrollers that I would never have considered. My current project is with a Teensy 4.x – using it to simulate novel instruction sets at 600MHz.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.