We keep wondering where the Arduino world is headed with the hardware getting more and more powerful. If the IDE doesn’t keep up what’s the point? Now we have at least one answer to that problem. Energia is the Arduino-like-framework for Texas Instruments based boards. They just came out with a multitasking system built into Energia targeted at the ARM Cortex-M4F based MSP432 Launchpad which we covered a few weeks back.
The announcement post gives a couple of examples of uses for multitasking. The simplest is blinking LEDs at different rates. If you wanted to do this closer to the metal you’re talking about multiple timers, or multiple compares on a single timer, perhaps a interrupt-driven-system-tick that has a high enough resolution for a wide range of your blinking needs. But these are not always easy to set up unless you are intimately comfortable with this particular architecture. The Energia multitasking will handle this for you. It’s upon the TI Real Time Operating System (TI-RTOS) but wraped in the familiar IDE.
The UI divorces you from thinking about the hardware at all. You simply launch a new tab and start coding as if you’re using a completely separate piece of hardware. The announcement post linked above mentions that these Sketches are running “in parallel”. Well… we know it’s not a multi-core system like the Propeller but we’ll let it slide. It is certainly easier than building your own scheduler for this type of hardware.
[Thanks MycoWx]
It’s great that with Energia IDE I can so easily create a multi-threaded application, but what bothers me is complete lack (according to the linked description) mechanisms for inter-process communication. Using global variables for communication and synchronization between threads is a great recipe for creating application with serious errors caused by race conditions, showing up only once in a blue moon. Absolute nightmare for debugging.
According to the description on TI’s web page, TI-RTOS has built in IPC mechanisms, so I hope this problem will be fixed in future releases of Energia.
How can you have a race condition when there is only one CPU?
Take this code as an example:
if(!busy) busy=1;
With preemptive multi-taking the task can be preempted by another task running identical code, just after testing the flag. Both tasks will see “not busy” and both will enter critical section. Such operations must be performed atomically.
That’s why I prefer a non-preemptive OS for most of my embedded work. It works just as well in nearly all cases, but there’s lot less worry about critical sections. I also make a minimum amount of threads.
Strangely enough, using a non-preemptive OS is rare. The standard idea is that an RTOS should be preemptive.
Non-preemtive OS has its uses, but you stay on mercy of tasks to return control to the scheduler. In such setup it may be difficult to achieve bounded response time, what is one of the basic requirements for a system to be real-time.
I do all the hard real time stuff in interrupts. Especially with modern CPUs where you have nested and prioritized interrupts, you can do an awful lot in that model. And tasks with exceptionally long CPU requirements are rare in my projects. Maybe if you have a graphical user interface or something like that, it can be an issue.
Ah, so the TLDR; is … If a race condition doesn’t exist in the architecture then you have to program it in.
inc busy; if (busy ==1)
Am I correct in guessing that you mean that the race condition there is due to improperly written code?
Assuming this code is run only on interrupts:
inc busy;
if( busy == 1) {
do stuff; // several critical instructions that should not be preempted
busy = 0;
}
The same problem occurs. Take the GP’s case of another task preempting the first after the first instruction. You’d see:
// start: busy = 0;
Task 1: inc busy; // busy now = 1
Task 2: inc busy; // busy now = 2
Task 2: if( busy == 1 ) // FALSE, busy = 2
Task 1: if( busy == 1 ) // FALSE, busy still = 2
Both loops end here, both of them didn’t run, and now the busy variable will never get reset (until it overflows). You can’t add in a
else if( busy == 2 ) {
busy = 0;
}
because if the task gets called twice quickly the busy variable will first get reset then the second call of the task would start, potentially before the first task has finished the critical section.
ex:
Task 1: inc busy; // busy now = 1
Task 1: if( busy == 1 ) // TRUE
Task 2: inc busy; // busy now = 2
Task 2: if( busy == 1 ) // FALSE, busy = 2
Task 2: if( busy == 2 ) // TRUE
Task 2: busy = 0;
Task 1: ~~ do some code that can’t be preempted ~~
Task 3: inc busy; // busy = 1
Task 3: if( busy == 1 ) // TRUE
Task 3: ~~ do some code that can’t be preempted ~~ // Running at the SAME TIME as the Task 1 code
Task 1: busy = 0; // Finished, but realize Task 3 is still running and can now be preempted
Task 3: busy = 0; // Finished. Everything is a hot mess.
TL;DR: It’s not improperly written code. Race conditions can and will occur with naive code (not using IPC), even on single-core architectures.
without IPC mechanism this is not much more usefull than protothreads…
still, when you see the mess with Arduino lately and the emergence of so many great alternatives I can’t see why people still use Arduinos…
Apart from a business dispute I see nothing resembling a mess . I just downloaded and installe the 1.6.3 IDE and can find no fault with it . Even the menus display properly so what mess do you refer to ?
I was refering to Arduino LLC VS Arduino SRL, reversioning, “unoficial boards” and such, not refering to the technical parts..
I’m guessing some sort of simple software hand shaking “clear to send” would somewhat decrease the speed of reading variables.
The implementation is currently incomplete.
Currently there is actually an event class that runs through built in TI-RTOS mechanisms, more IPC stuff is supposed to be exposed in future updates. There are a few issues with MSP432 support on Energia right now actually, most due to be sorted in Energia 16.
My only issue with the RTOS setup is that the MSP432 is the only platform Energia supports this on and its not optional for the 432. In future I’d prefer for it to support all the capable platforms Energia currently supports (although it looks like the cc3200 may be getting it) but it should be optional. With RTOS, a simple blink sketch compiles to 45kb. As you add more threads the increase in sketch size is perfectly reasonable, its just that initial 45kb chunk used by the OS that I’d like the option of bypassing.
It’s not listed on the website, but the plan is to provide a simple abstraction for the TI-RTOS native IPC so they’re more intuitive than global variables (which must be declared in the 1st tab btw, not sure why).
first tab globals may have been deliberate. Not sure.
You have the exact same problem if you use ISR’s on timers.
If you are working on serious projects and need multi-tasking the easy way you should consider xmos.com products.
This young company is booming for good reasons.
There is no need to switch microcontrollers only because current version of Energia does not support IPC. There is TI-RTOS, FreeRTOS also has been ported. Xmos’ processors are very interesting, but are not a replacement for MSP432.
Sure a RTOS is a challenge to set up but that is because it’s a full operating system. Setting up multi-tasking / multi-threading or task scheduling is no big deal. The hardest part is memory allocation but in this case there is simply a memory area dedicated to each task – problem solved.
Some counters, vectors and register storage in a table is all that is needed. It could even be done in a higher level language like C but performance would be an issue.
IMHO the main problem with this is you can’t disable the RTOS (AFAICT). This makes regular “blink” something like 45KB. :-) An option to use a normal MCU “single tasking” setup would be appreciated.
However, if you want to use the RTOS as far as inter-process primitives go, while I agree they should probably provide standard ones, it is not that hard to make your own (based around disable task switch interrupt / do something “atomic” / re-enable interrupt).
I would assume that since this uses TI-RTOS, all of the TI-RTOS “real” multitasking functions are available, just not documented separately for Energia, in the same way that a lot newlib isn’t documented for Energia/ARM.)
But the “interesting” part of this is the attempt to provide “simplified multitasking” to the average (relatively ignorant) Arduino/Energia user. For years, they have been asking “can I do several things at once? Maybe if you implement one of those ROSes?”, and the answer has been “it’s not that simple, and it’s probably a lot more complicated than “Blink without Delay” – just use that!” Well, if Energia has succeeded in making it “that simple”, then that IS a big deal.
Also interesting is that Energia is using Make to build 432 sketches. I guess that was a near-necessity with the large TI-RTOS libraries. But still…
(And I agree with Xark that it’s annoying that there is no non-RTOS build capability for 432.)
It’s never going to be simple to use multitasking for the ignorant user. It’s very easy to create race conditions without realizing it, and produce subtle bugs.
Ditching C would help, shared memory should be the exception and not the rule.
That said, if multiple decades of never ending buffer overflow bugs with billions of dollars worth of damages can’t dislodge C from desktop/server programming I don’t give any alternative much of a chance in the embedded realm either. Zinc looks interesting though (MCU port of Rust).
[Zinc](1) is definitely interesting. About a year ago I jumped on the bandwagon with Go for light concurrent network stuff, but have since become disenchanted with it and started looking at [Rust](2). Being able to use it for embedded stuff would be awesome!
In the past [mbeddr](3) looked promising from the DSL/meta-programming perspective, but I never examined it critically. [Forth](4) and [Lua](5) still look interesting (the latter for mid-range or better MCUs). Also Erlang, Ocaml, or Haskell with some EDSL (like [Atom](6)) depending on type of real-time we require… I’m starting to drool.
[1]: http://zinc.rs/
[2]: http://www.rust-lang.org/
[3]: http://mbeddr.com/index.html
[4]: http://www.forth.com/forth/index.html
[5]: http://www.eluaproject.net/
[6]: https://hackage.haskell.org/package/atom
I wish the likes of Atmel or TI would finally wake up and embrace multi-core microcontrollers, so the costs come down. Parallax and XMOS have no stranglehold on multi-core.
For small microcontrollers, multi core doesn’t make much sense. It’s easier to double the clock and get the same performance.
For bigger controllers, there are multi core devices, like OMAP 4 from TI.
It’s not so much about performance as it is ease of implementation (or maybe robustness of implementation) – I love having 8 cores I can send off in different directions and not have to worry about blocking or interference. For projects that involve a margin of safety it’s nice to know that I have a dedicated piece of hardware rather than having to deal with the FUD of wondering whether I got all the interrupt and timer code right.
I’m still disappointed that Parallax never managed to come out with their 32-core processor.
But in return you get to worry if your communication code is correct.
With simple synchronous inter-controller or intra-core architectures, your inter-processer routines are tricky to get right at first, but then they just fall into the background as cookie-cutter ‘drivers’. But in critically-timed routines (e.g., serial bit-banging), you may have to pay close attention to bus timing if more than one processor is involved (e.g. full-duplex bit-bangged flow controlled). But in the large majority of applications that do require inter-processor communications, it is fire-and-forget simple once the driver routine is pasted in. Also keep in-mind that overall, outside of initialization and infrequent (if any) passing of results, intra-processor communications is pretty infrequent in the big picture over time.
In an architecture that employs a (e.g.) high-speed asynchronous bus, things may get a bit easier, but you do have to live with some level of communications timing non-determinism.
Trying to juggle the likes of a hierarchy of ISRs in a complex single-processer multi-tasking application is akin to rewriting a schedular all over every time. A nightmare to not only develop, but to especially maintain overtime!
Doing this on a well designed multi-processor/core platform is a breeze in comparison. One you have been there, going back to single core is painful.
A. Doubling the clock to achieve needed multitasking performance doesn’t scale. In reality when you need N similar things going on in parallel you end up maxing the clock then each of the N similar things get 1/N of the resources. Plus if even one of the N things runs all the time, you have to run the controller all the time. This is wareful energy-wise.
With N separate inter-connected controllers or cores each task can run at the full clock rate. And if modern layout design is used, each controller could be designed to run at its own clock speed and each controller could individually sleep. All this reduces power consumption.
But the real advantage is ease of development and maintenance of applications. In my experience it it much easier to handle complex multitasking embedded microcontroller applications on a multi-controller/multi-core platform interconnected by a common asynchronous bus and/or commutating bus.
B. The “bigger controllers” you refer to are far more complex than a multi-core microcontroller I think we are talking about here. These large controllers are designed from the start to run an OS that has a scheduler, and often memory-management-unit (MMU) hardware is included on-die as well. Also these large controllers (SoC’s actually) employ on-die busses (e.g. AMBUS) that slow down GPIO speed horribly. If you need fast GPIO, you are right back to hanging a microcontroller on the SoC! Other than that, you can start hacking some type of dedicated I/O bus (e.g. USB, LCD/display, memory, etc.) and expect to get into direct DMA as well – messy.
Usually, you don’t want to multiply all resources by N. Maybe each controller has 10 timers, 4 UARTs, and 8 PWM channels, and an Ethernet interface. Do you really need N*10 timers, N*4 UARTs, N*8 PWM channels, and N Ethernet interfaces ? Probably not.
So, just copy/paste of the entire design is very wasteful. It would basically make it N times more expensive, and create a packaging problem with all those pins.
Many manufacturers are sticking to a single core, but increasing the performance by using higher clocks, better flash interface, multiple memories, buses and DMA controllers. This takes fewer resources, and allows you to solve the same problems. Maybe you have to think a bit harder about the design, but that’s worth it to save the money.
There is NXP LPC4000 family dual core Cortex M4/M0..
Yes, the LPC parts are a step in the right direction, but with just two asymmetric processors and rather poorly implemented inter-processor support, they barely scratch the surface of what is possible. Good call-out though.
Asymmetric processors make a lot of sense. You can use the M0 for simple stuff that needs exact timing, and use the M4 for less strict timing, but higher performance.
I’m a big fan of TI and their ARM microcontroller lines. They offer excellent support via Code Composer TI, Energia and low cost launchpad dev boards with Debuggers. I’m sad however to see them not be part of the mbed project.
Mbed is not just an alternative to Arduino, Sure it is a high level API, but it is more flexible than Arduino’s and is in my opinion better suited for commercial/research use. Mbed was specifically designed from inception for ARM microcontrollers. It also supports multiple boards from different chip manufacturers; Freescale, NXP, ST, Renesas and more! In addition it is being supported by ARM and is getting some major IOT and OS improvements in the next couple of months.
The way i see it is that Arduino API is great for hobbyists, artists and getting young kids excited about programming. The mbed api is more mature and seems like a better fit in higher learning and commercial applications.
Having said all that, I must say that the TI TIVA C & MSP432 Launchpads, when paired with Energia, offer the best performance per dollar in the world of Arduino compatible boards; especially when one considers that these boards also have built-in debuggers.
Yes, but take a close look at the MSP432 launchpad. There is a big line right in the middle of it, and probably 40 components to get from the USB connector to JTAG programmability. Why? All other microcontroller manufacturers have gone with and brought out native USB controllers. TI is the only one peddling a part that is so old it is still using JTAG as a programming interface.
I believe that the TIVA C micros all have at least USB device functionality. The MSP432 doesn’t, but there’s another chip on the MSP432 launchpad that probably can. If one can use mbed with an LPC1114 (with help of an external USB chip) then mbed on TI’s TIVA C and MSP432 is possible. Worst case scenario, introduce a newer revision of the board and modify the onboard debugger/programmer chip accordingly.
I believe that its a conscious choice that TI is making.
“40 components” – yeah, the launchPad products would be a lot more “elegant” if that USB to JTAG/Serial debugger circuit had been implemented ONCE and stayed the same, while the target went through the product line. Instead, I don’t think that there are two launchpads that use the same CHIP for that circuitry, much less the same “extra” stuff.
In the case of the 432, they’ve added some additional stuff (“EnergyTrace+”) so that the host can measure power consumption, since the board is supposed to be a showcase for low power ARM technology. I guess that’s neat; the idea seems to be pretty “stylish” these days (some EFM boards, some STM32L boards, etc.)
Recently i went reading how the Apollo team built their computer system (the Apollo Guidance Computer or AGC); it was a primitive computer; a 16 bit instruction set, 12 base instructions, 1 Mhz and roughly 16Kb memory. I’m not sure about a Memory Protection Unit.
But, software side, it was a very very advanced OS for the time: a rudimentary preemptive scheduler, and they even built a rudimentary hypervisor for running some sort of virtual machines … Fifty years ago (an eternity), they designed a software system where failure was not an option. All those security measures against bugs in application software (and erroneous data from peripherals) really saved the Apollo mission.
Apparently, from what i’m reading here, we have today processors multiple orders of magnitude more powerful than the AGC but software side we still live in pre-historic times. Hey, wake up, it’s 2015 today … My opinion is an MMU, a preemptive OS, perhaps a microkernel should be mandatory in the embedded space. Except if you only blink a led.
No need to make that mandatory. If it’s appropriate for the problem you’re trying to solve, the tools are available.
Using an OS when you don’t need one only makes things unnecessarily complicated and introduces extra opportunities to make mistakes.
Take the blink a led as an example: more than once i’ve seen examples where you loop n thousands times to emulate a timer; yeah it is easy. With an RTOS like the one provided by the Energia team, you use abstractions like sleep(n milliseconds); it’s even easier. The difference? The CPU is only running 10 microseconds every second with the RTOS; in the first example you are at 100%CPU. When tens of TI engineers have tuned their CPU to be as energy efficient as possible, what you do when looping is a complete waste. Why reinvent the wheel every time and for each project? What’s the matter if you load 45K for even the most simple project? 1) your project is highly energy efficient 2) if i ask to blink one more LED at a different rate, it doesn’t transform into a nightmare, it is still very energy efficient and surely less error prone by a huge margin.
When your solution is debugged and only if you want to sell 10000 pieces, then you can begin to optimize the cost.
“Premature optimization is the root of all evil — DonaldKnuth”
Without an RTOS you can also do sleep(n milliseconds). Or you can blink the LED in an interrupt handler, and stay in sleep mode. Or you can use a non-preemptive scheduler, and do exactly the same as on the preemptive one. Or you can run the CPU at 32 kHz, and save power that way.
And I’m not advocating premature optimization. On the contrary, I’m all for choosing the simplest solution that works. What if you want to move your Energia-based code to an STM32L ?
I’m dubious about attempts to abstract/simplify multithreading. You can get into a ton of trouble with it if you’re not aware of what you’re doing, and that includes what the abstraction layer is doing behind your back. Here’s an example, and although it’s specific to Windows and high-level languages (HLLs), the lessons I learned carry over to any programming environment.
In old VB6 you could build multi-threaded apps. It wasn’t officially supported or easy, but it could be done, as long as you understood how things worked at a low level. A typical way of sending information to another thread was to have a subclassed WinForms control in the target thread that listened for a specific message. The source thread would SendMessage (synchronous) or PostMessage (asynchronous) that message to it, along with the data (or reference to the data), and the target thread would pick up that message when idle and processing its message loop.
Next there was .NET, in which you were provided with the Control.Invoke (synchronous) and Control.BeginInvoke (asynchronous) methods. The documentation only describes how to use these, not how they work under the hood. But seeing both sync/async methods – and attached to a control at that – I assumed that like many other things, .NET was simply providing a wrapper for the older technique; and converted my VB6 code library and apps based on that assumption.
At which point my apps started sporadically crashing, typically under heavy load. Weeks of attempts at debugging proved fruitless. When the standalone app crashed, it provided no debugging or callstack info, even though it was supposed to. And when I ran the app from the IDE, it wouldn’t crash. I finally had a breakthrough when I littered my code with checks to detect reentrancy where there should be none. Turns out that when another thread is Invoked, the target code may run immediately, or at some arbitrary time, rather than only when that thread is in its idle message loop. Essentially it sometimes acts like an interrupt, preempting code already executing, leading to all sorts of concurrency issues. But only, or at least much more frequently, when the code isn’t running from the IDE! What…the…****?
All this happened years ago, and to this day, I still haven’t found a low-level description of how Invoke schedules or performs execution in the target thread. Maybe they don’t want you to know, after all, .NET is an abstraction layer. Maybe Microsoft wants to be free to silently change the underlying mechanism at some point. That it acts differently depending on whether it’s running standalone or from the IDE may indicate there are already two mechanisms in play. But whatever the case is, if I can’t understand and predict its behavior, it’s useless to me.
I ripped out every Invoke, and put the old VB6 technique back in. I suppose I could have gotten my code working with the .NET Invoke method instead, but it’s really better this way. With my homebrew “invokes” occurring only during idle message loops, I can easily predict where invalid concurrent access may occur, and only a few SyncLocks are required to guard against it. With .NET Invoke I can assume nothing, and would have to guard as if interrupts were possible; requiring so many additional locks that extremely careful planning would be required to prevent deadlocks. Not something I care to do in a HLL on huge projects with ridiculous CPU power at my disposal. But still necessary in C on MCUs for small-to-medium projects and limited computational power.
So when I read things like “Energia multitasking will handle this for you”, or that your tasks (I refuse to call them sketches) simply run “in parallel”, it just makes me cringe. Providing shortcuts so that you don’t have to deal with the low-level stuff on a regular basis is acceptable – but promoting ignorance is not. You always need to at least be aware of what’s going on under the hood, because eventually any shortcut will fail you. And I don’t feel it’s asking too much to expect people to learn that. All the low-level stuff is a lot like chess. It doesn’t take long to learn the rules. It may take you a lifetime to learn how to play well, but at least you can then play, or watch a game being played and understand what’s going on – including where a critical mistake might have been made.
TI-RTOS is a cooperative scheduler, meaning that it is not time-sliced. Tasks that run at the same priority will not involuntarily pre-empt each other. Unless a task blocks or yields, it will run 100% of the time.
In the current implementation, each sketch (ie task) is run at the same priority as all other sketches. Consequently global variable sharing is relatively safe.
However, as is already well described in countless Arduino forums, threads invoked by attachInterrupt() wil pre-empt the sketch(es) running in the background. The usual care must be taken in those cases.
Upon returning from loop(), each task will yield, thus giving other sketches time to run.
Also, if a sketch calls a blocking API such as delay(), Serial.print(), Wire.endTransmission(), etc, the CPU is handed over to the next sketch that yielded its time.
Hmm. That’s not what the TI page says, and not what is implied by “RTOS” (IMO.) http://www.ti.com/tool/ti-rtos :
“TI-RTOS Kernel (formerly known as SYS/BIOS) provides deterministic preemptive multithreading and synchronization services, memory management, and interrupt handling. TI-RTOS Kernel is highly scalable down to a few KBs of memory.”
I’m pleased to notice that it also claims to do a bunch of power management stuff. (Otherwise putting it on the 432 Launchpad would have been a strange choice, I guess.)
I wasn’t being very precise in my description. TI-RTOS is indeed a preemptive kernel. However, the energia configuration currently defines all Sketch tasks to run at the same priority level, thus they do not preempt each other, which makes global variable sharing safe between Sketch threads. Global variable sharing between an interrupt thread and a Sketch thread is still vulnerable to the non-atomic read-modify-write problem.
Don’t equal-priority tasks round-robin after the timeslice expires? Or is it a special feature of TI-RTOS that in behaves cooperatively under those circumstances? (I’m debating how much time I want to spend learning about a vendor-proprietary OS. When I did a project that made linux pthreads behave cooperatively, it was a real pain to defeat the inherent preemption.)
TI-RTOS is NOT a time-slicing kernel. Tasks of equal priority must intentionally yield or block for other tasks to get CPU time.
Can anyone help me out in interfacing an SD card with MSP432 using Energia. ???