Running 57 Threads At Once On The Arduino Uno

March 17, 2021

When one thinks of the Arduino Uno, one thinks of a capable 8-bit microcontroller platform that nonetheless doesn’t set the world alight with its performance. Unlike more modern parts like the ESP32, it has just a single core and no real multitasking abilities. But what if one wanted to run many threads on an Uno all at once? [Adam] whipped up some code to do just that.

Threads are useful for when you have multiple jobs that need to be done at the same time without interfering with each other. The magic of [Adam]’s ThreadHandler library is that it’s designed to run many threads and do so in real time, with priority management as well. On the Arduino Uno, certainly no speed demon, it can run up to 57 threads concurrently at 6ms intervals with a minumum timing error of 556 µs and a maximum of 952 µs. With a more reasonable number of 7 threads, the minimum error drops to just 120 µs. Each thread comes with an estimated overhead of 1.3% CPU load and 26 bytes of RAM usage.

While we struggle to think of what we could do with more than a handful of threads on an Arduino Uno, we’re sure you might have some ideas – sound off in the comments. ThreadHandler is available for your perusal here, and runs on SAMD21 boards as well as any AVR-based boards that are compatible with TimerOne. We’ve seen other work in the same space before, such as ChibiOS for the Arduino platform. Video after the break.

44 thoughts on “Running 57 Threads At Once On The Arduino Uno”

Bob says:

March 17, 2021 at 11:37 am

Rodney Brooks subsumption programing for robots.

Report comment

Reply
1. Multiple Thread Microcontroller Nomenclature says:
  
  March 17, 2021 at 12:50 pm
  
  Finally moving away from AVR architecture. It’s about time given how the price has stayed absurd and the speed and memory of basically every processor has also stayed low.
  
  Would be amusing to see how this type of use extends to what seem like upcoming (eventually) RP2742 and up style Raspberry Pi microcontroller devices which already literally have two processors for their microcontrollers and actual memory (up to 16 MB already!) and decent CPU speeds and actually not too bad 40 nanometer CPU manufacturing as well.
  
  Multiple PIO microcontrollers with decent memory available? Yes, please! I wonder how many total threads you can get when your microcontrollers run at 800 MHz or more per processor with potentially 4 or even more processors per chip and everything being actually reasonably priced, robust, supports anything from (some but quite robust) low level assembly, direct C/C++ as well as more higher level MicroPython and CircuitPython (basically Python but for microcontrollers) as well?
  
  https://www.cnx-software.com/wp-content/uploads/2021/01/Raspberry-Pi-MCU-Nomenclature.jpg
  
  Report comment
  
  Reply
Severe Tire Damage says:

March 17, 2021 at 11:57 am

I despise and thumb my nose at the AVR architecture every change I get, but ….

It is all about what the threads are supposed to do. I am not sure what the magic “57” is about, but the story with any RTOS with threads is that the threads are all normally blocked waiting for their purpose in life to happen. So it is more a matter of the size of the table to handle them than the compute horsepower of the host if they are all (as they should be) sitting there blocked waiting for their cue to come on stage.

Report comment

Reply
1. Ezra Thomas says:
  
  March 17, 2021 at 12:17 pm
  
  Other than being an older architecture, what don’t you like about it? I’m designing a custom CPU on an FPGA and so your insight on this would be particularly interesting to me.
  
  Report comment
  
  Reply
  1. tekkieneet says:
    
    March 17, 2021 at 3:18 pm
    
    Not OP, but I have a few beef with the CPU core.
    – I like unified memory model e.g. Arm, STM8. I/O is memory mapped. So Any memory can be used for code or data without requiring special instructions for data in code space, I/O instruction. i.e. You can download code and run directly in RAM. You don’t need stupid macro in compiler to access data in code space.
    – Silly thing like “fuses” to set clock source that can’t be controlled in firmware at run time. STM8 has fuses, but they can be reprogrammed from firmware.
    – Atmel didn’t have hardware debugger at first. They still haven’t publish the protocol for chips that have it.
    
    AVR peripherals hasn’t keep up
    – Even chinese clone have 12-bit ADC these days. :P
    
    Old chip process. So you pay a lot more for memory and peripherals than Arm chips.
    
    Report comment
    
    Reply
    1. Grawp says:
      
      March 17, 2021 at 7:09 pm
      
      This! Each and every point you’ve mentioned except for the peripherals is a reson why the AVR architecture is so, well, hardly usable. Taking into an account that nowdays it is also pricier by any reasonable metric compared to ARM (I’m giving ARM just as an example of an architecture I’m familiar with) there’s no reason whatsoever to use it in new designs even for onetime hobby projects.
      
      Report comment
      
      Reply
      1. PPJ says:
        
        March 18, 2021 at 1:12 am
        
        My experience with STM Discovery board: 80 hours (no joke – several hours every day for 2 weeks) of setting up drivers and finally I don’t know why it was not working. I had IDE working under Windows (that was the only reason to come back to windows on that time), no IDE working under Linux.
        My experience with Arduino: Everything running after 5 minutes under Linux (and exactly same IDE on Windows if I need).
        
        Sure all this debugging stuff are cool and eventually essential but if there is no start there will be no need for debugging at all. This is why Raspberry Pi and Arduino (with it’s “IDE”) are still competing despite many “killers” showing up every season.
        
        Report comment
      2. Grawp says:
        
        March 18, 2021 at 2:15 am
        
        @PPJ: I don’t use C/C++ and don’t like automagic and haven’t ever needed to do anything with some drivers whatsoever. (What drivers are you talking about?)
        
        Having said that, if you want to do some Arduino-like stuff, just install VSCode, install PlatformIO from its Marketplace with one click. Create new project -> Choose your board -> Choose your starting example/framework -> Click PLAY button to run it immediately.
        Now you have very fast, almost notepad-fast fullblown IDE with syntax highlighting, indexing, autoformatting, debugging etc etc…
        
        Report comment
      3. tekkieneet says:
        
        March 18, 2021 at 6:36 am
        
        @PPJ: For about 2x the time, I have learnt Arm from scratch and ported/customized a RTOS that previously haven’t supported that chip series. That’s all thanks to having hardware debugger as it helps a lot to figure where things have crashed.
        
        In life, there are travellers and tourists. I prefer to exploring new things and not do exactly the same old sh*t as everyone else.
        
        Report comment
    2. Harvey says:
      
      March 17, 2021 at 9:43 pm
      
      How about selecting the right part for the job? I don’t know how many times I’ve seen the ‘could have used a 555, for that’. And sometimes, I could actually see it done. There are also a lot of people, who like to push a part, for all it’s worth, and more, just as a challenge. Some people just have a favorite part, they are most familiar, and takes little thought or effort to use. Others, want to sell the latest, and greatest, sort of bragging about their wealth, and intelligence. To be honest, most projects only actually use a small fraction of what a microcontroller is capable of doing. We only use the features we actually need, to get a job done, with plenty left over, unused. AVR has been around for a very long time, many examples of using them for most anything. Pointless to have to start entirely from scratch, every time a new part is released. Older microcontrollers will have a lot of use and value, for a long time.
      
      Report comment
      
      Reply
    3. rpavlik says:
      
      March 18, 2021 at 5:21 am
      
      I’m confused what you mean by no hardware debugger. I was pretty sure I had used gdb with avr chips, and definitely know I’ve used it with atmel studio, both with the 328 and the xmega256a3bu which is a beast of an avr.
      
      Report comment
      
      Reply
    4. tekkieneet says:
      
      March 18, 2021 at 6:27 am
      
      *Older* Mega8 and smaller ones has no hardware debugger and only have it after the mega16/32. You obviously hasn’t been around before Arduino days. There are also those larger ones with JTAG.
      
      Are there any open source AVR hardware debugger dongles!? Atmel’s hardware debugger is closed source. There are software that talks to official debugger.
      
      Report comment
      
      Reply
      1. fhunter says:
        
        March 20, 2021 at 2:09 am
        
        Olimex has/had avr debugger – opensource, protocol was cracked/reverse engineered
        
        Report comment
2. przemek.klosowski says:
  
  March 17, 2021 at 1:07 pm
  
  Apparently each thread uses 26 bytes , and there’s 2048 bytes of SRAM ; 26*57=1482, so that’s probably what limits it. Of course the actual thread code has to fit somewhere too,
  
  Report comment
  
  Reply
  1. Paul LeBlanc says:
    
    March 17, 2021 at 2:25 pm
    
    The thread code won’t use any RAM itself, since it runs from flash memory. But there’s got to be memory somewhere for the global stack and whatever global variables the task manager might need. Plus there’s the heap for memory allocation.
    
    However, one would think that each thread would have to have its own stack or you couldn’t have pre-emptive task switching without severely limiting what each thread is able to do, so that brings into question that “26 bytes per thread” number.
    
    Then there are the 32 16-bit general purpose registers (64 bytes), some if not all of which would need to be saved and reloaded during a thread switch – again, without severely limiting what each thread is able to do. I don’t know what the AVR compilers do specifically, but most compilers will put as many local variables as possible into general-purpose registers in order to improve operation (both execution speed and code size reduction, although in the AVR architecture the former might not count for much) – either that or they go on the stack. The registers could be saved to the thread stack, but to save them all would be a minimum of 64 bytes. Plus the stack pointer, plus the program counter and you’re up to 72 bytes minimum (*57 = 4104).
    
    Something’s got to give here, because that doesn’t fit.
    
    Report comment
    
    Reply
    1. bobby says:
      
      March 17, 2021 at 6:56 pm
      
      There is no stack for each “thread”, only a single stack. As I understand it, the whole system is effectively the same as nested interrupts, where each “thread” is called from the timer ISR function. Threads must return, otherwise they block lower-priority threads (therefore blocking functions require a “thread” to be made up of multiple functions). Have a look at the readme on the git repo
      
      Report comment
      
      Reply
    2. bobby says:
      
      March 17, 2021 at 7:01 pm
      
      W.r.t. your memory requirements comment, you only need enough stack to save the registers once for each priority level running concurrently, plus whatever local variables are used in those concurrent “threads”
      
      Report comment
      
      Reply
      1. dimag0g says:
        
        April 5, 2021 at 1:18 am
        
        Not in the case of real threads which can call yield()
        
        Report comment
      2. bobby says:
        
        April 5, 2021 at 2:57 pm
        
        Which these are not.
        
        Report comment
3. Daniel Dunn says:
  
  March 17, 2021 at 3:59 pm
  
  The AVR has the really nice feature of being super standard and having clones that are usually good enough. They’re very simple and rugged.
  
  What I really don’t like is overuse of low end ARM. If you
  need that level of power, your application could probably benefit from connectivity, so why not go for ESP?
  
  Report comment
  
  Reply
  1. Somun says:
    
    March 17, 2021 at 4:49 pm
    
    Which AVR clones?
    
    Many reasons to choose an cortex m over the Esp32. Power consumption, huge selection of peripherals, multiple vendors… And having a recent GCC available is another advantage.
    
    Connectivity doesnt always mean wifi.
    
    Report comment
    
    Reply
  2. paulvdh says:
    
    March 17, 2021 at 5:24 pm
    
    Huh, does ESP have built-in USB?
    That’s what I want for my connectivity.
    
    Report comment
    
    Reply
    1. rpavlik says:
      
      March 18, 2021 at 5:22 am
      
      Yeah, ESP32-S2 (out now) has USB and WiFi but no Bluetooth. (And the ulp Co processor is RISC-V) and -s3, out soon, has USB, Bluetooth, and WiFi.
      
      Report comment
      
      Reply
      1. tekkieneet says:
        
        March 18, 2021 at 6:15 am
        
        Actually the USB is *not* on the ESP32-S2 module. Might want to look very closely at the tin can portion of the PCB.
        
        It is on the ESP32-S2 DevKits – a separate chip on a breakout PCB that handles the hardware debugger. It is not controllable from the ESP32-S2 module side and only from an external USB host.
        
        Report comment
  3. tekkieneet says:
    
    March 18, 2021 at 6:18 am
    
    Low end Arm chip has price, speed, memory and peripheral advantage. What’s not to be liked?
    Unlike the ESP, there are also decent development envirnoment and decent hardware debugger support in the available IDE.
    
    Report comment
    
    Reply
Jay says:

March 17, 2021 at 12:05 pm

This sounds like a great application for the Controllino to use with safety. Constantly checking the safety circuit and when triggered interrupts main loop to actually shut machine down…

Report comment

Reply
1. fiddlingjunky says:
  
  March 17, 2021 at 1:42 pm
  
  Safety critical features seem much better suited to an actual interrupt. Timer-based interrupts for checking state or hardware interrupts directly off a comparator or some such, which could call a safe-shutdown function if necessary. There’s a lot of room for introducing error in a thread-based safety loop, and it would likely take more development and take up more overhead.
  
  Report comment
  
  Reply
2. Steven Naslund says:
  
  March 17, 2021 at 2:00 pm
  
  Seems to me like a good purpose for these threads could be like I/O sampling loops in industrial control where you want to read an input and make a control output decision. Kind of like how you would use a PLC. First thing that came to my mind. A safety interrupt could be part of that if the timing is reasonable. For example, a temperature limit hit a couple times a second is probably good enough. An operator e-stop circuit probably needs to be a hard interrupt since someone pushed the “oh, no” button.
  
  Report comment
  
  Reply
  1. Steven Naslund says:
    
    March 17, 2021 at 2:05 pm
    
    Number of threads you can manage also has a lot to do with how tight the loops is. In industrial control say I am reading a temperature and turning a heater on and off. That is very tight deterministic code. If you are sitting around waiting for an asychronous event, that is a bigger problem since you can’t determine how long that thread will run.
    
    The architecture needs to have enough memory to store your thread and the thread switching requires time and processing power as well. A lot of devices can be made to handle threaded applications but certain devices are hardware optimized to deal with certain numbers of threads.
    
    Report comment
    
    Reply
3. Jii says:
  
  March 17, 2021 at 2:14 pm
  
  That’s not going fly on an actual safety application. Microchip does have microcontrollers with certified safety functionality, but you are going to have to use their special compilers to program that.
  
  Report comment
  
  Reply
Rob says:

March 17, 2021 at 1:33 pm

Heinz (baked beans) had 57 varieties.

Report comment

Reply
1. Paul LeBlanc says:
  
  March 17, 2021 at 2:32 pm
  
  Not all of those varieties were baked beans. And they actually had 60 different products when they came up with the slogan.
  
  Report comment
  
  Reply
Ken Bloom says:

March 17, 2021 at 2:15 pm

I don’t understand where this attitude comes from: “Unlike more modern parts like the ESP32, it has just a single core and no real multitasking abilities.”

I remember the days when we did multitasking on Macintosh System 7 without multiple cores, without protected memory between processes, and without preemptive multitasking driven by a timer (you had to call SystemTask() or GetNextEvent() to give other processes a turn.) Why do people think you need multiple cores to do multitasking today? All you need is to understand your stack layout and your processor’s register saving convention when servicing an interrupt. (And *maybe* you want a clock interrupt to do preemption.)

Implementing multithreading on a 68HC11 microcontroller was a popular instructional exercise 20 years ago when I was in college. (I haven’t done it myself, though.)

Report comment

Reply
1. RÖB says:
  
  March 17, 2021 at 6:59 pm
  
  Writing a time division CPU share (multi-task) core is still a very good educational exercise. Especially so in ASM.
  
  Report comment
  
  Reply
Greg A says:

March 17, 2021 at 3:24 pm

i’m sure it’s an elegant hack but when you’re writing for a microcontroller i really think all these things should be thought of as what they are, rather than abstracted into something big-computer-sounding. each thread should be an ISR and maybe a bottom half (a portion that operating in the main loop, draining a queue or state machine triggered by the ISR). you should think of the intersection between the requirements of your design and the abilities of the processor in question.

and on the flipside, when you’re writing for a big computer (like raspberry pi or whatever), you shouldn’t be messing around with these timing-sensitive things, they should be offloaded entirely to a peripheral or co-processor.

if you want to use a general purpose processor as a I/O co-processor, propeller is the architecture that mastered that. tacking multithreading onto an AVR isn’t going to accomplish any of these things very well.

anyways that’s just my philosophy

Report comment

Reply
1. pelrun says:
  
  March 17, 2021 at 9:10 pm
  
  It’s not a “hack”, it’s a piece of engineering.
  
  And proper RTOSes are vital for many embedded applications. A trivial interrupt handler setup is not going to keep up when you need processes running on differing timescales and priorities, with complex locking interactions between them, and needing to hit timing guarantees.
  
  Sure, you can write a giant state machine by hand and spend all your time debugging subtle edge cases and failures, but who voluntarily does that?
  
  Report comment
  
  Reply
OldSurferDude says:

March 17, 2021 at 4:10 pm

Here’s an application: NTP server

In my implementation there are two functions. When available, read the time from the GPS module and set the Arduino time with it, When there is data on the ethernet port, check that it is a time request and return the Arduino time. Essentially two threads. Right now I have two calls in “loop” Yeah, they could be an ISR, but you asked for an application

Report comment

Reply
1. smellsofbikes says:
  
  March 17, 2021 at 7:14 pm
  
  I wrote a much less capable version of something similar, that was a tick-based scheduler with priority. I did it for a stepper motor driver that needed very regular ticks for the stepper, but also needed to poll several user pushbuttons and display some stuff to an lcd. It works really well as a single axis driver for my lathe feed screw. It’s a decent use for an atmega.
  
  Report comment
  
  Reply
Harvie.CZ says:

March 17, 2021 at 4:52 pm

Are these actual threads or just bunch of timers with some callback array? Difference is that threads can run several concurent infinite loops without blocking whole core while in timer ISR you cannot do that.

Report comment

Reply
1. pelrun says:
  
  March 17, 2021 at 9:02 pm
  
  There’s no effective difference – this implementation just puts the infinite loop outside the thread’s run() function, rather than expecting the thread to do it manually.
  
  Report comment
  
  Reply
  1. a says:
    
    March 21, 2021 at 8:22 am
    
    Please fix the link to the library, should be https://bitbucket.org/adamb3_14/threadhandler/src/master/ instead of a link to YouTube.
    
    Report comment
    
    Reply
2. Simon says:
  
  March 17, 2021 at 9:15 pm
  
  They are just a bunch of functions called from a timer interrupt. You cannot have concurrent infinite loops running without breaking the loop into multiple functions (aka “code blocks” in this library), or adding your own state management (to remember where you were up to before returning/yielding), as a higher-priority function must return before a lower-priority function can continue.
  
  Report comment
  
  Reply
John Stag says:

March 18, 2021 at 12:36 am

I’m guessing the “Magic 57” is because that’s how much RAM there is.

Or maybe he likes baked beans.

Report comment

Reply
Ken Boak says:

March 19, 2021 at 8:24 am

It’s a case of the right tool for the job.

In 1996 when the AVR first appeared, it was one of the first microcontrollers to use flash memory. Before that it had been UV erasable, mask programmable or OTP one time programmable.

You cannot exactly compare an 8-bit microcontroller intended for consumer appliances with a 32-bit ARM. The ARM really only reached the mass market after it had appeared in Nokia cell phones and PDAs.

I agree with @PPJ – the initial learning curve with the ARM was steep. Back in 2013, when I took on my first STM32 Cortex M4 project, you had to hunt around for no-cost tools (CooCox etc) and then it would take you a week to get your first LED to blink. After that, it got much easier.

As someone who is a hobbyist programmer, the Arduino ecosystem has given me access to a vast range of new microcontrollers that I would never have considered. My current project is with a Teensy 4.x – using it to simulate novel instruction sets at 600MHz.

Report comment

Reply

Hackaday

Running 57 Threads At Once On The Arduino Uno

44 thoughts on “Running 57 Threads At Once On The Arduino Uno”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Why The Latest Linux Kernel Won’t Run On Your 486 And 586 Anymore

One Laptop Manufacturer Had To Stop Janet Jackson Crashing Laptops

The 2025 Iberian Peninsula Blackout: From Solar Wobbles To Cascade Failures

Field Guide To The North American Weigh Station

The Rise And The Fall Of The Mail Chute

Our Columns

FLOSS Weekly Episode 839: I Want To Get Paid Twice

South Korea Brought High-Rise Fire Escape Solutions To The Masses

C++ Encounters Of The Rusty Zig Kind

Data Visualization And Aggregation: Time Series Databases, Grafana And More

Hackaday Links: June 29, 2025

44 thoughts on “Running 57 Threads At Once On The Arduino Uno”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns