ANTIRTOS: No RTOS Needed

Embedded programming is a tricky task that looks straightforward to the uninitiated, but those with a few decades of experience know differently. Getting what you want to work predictably or even fit into the target can be challenging. When you get to a certain level of complexity, breaking code down into multiple tasks can become necessary, and then most of us will reach for a real-time operating system (RTOS), and the real fun begins. [Aleksei Tertychnyi] clearly understands such issues but instead came up with an alternative they call ANTIRTOS.

The idea behind the project is not to use an RTOS at all but to manage tasks deterministically by utilizing multiple queues of function pointers. The work results in an ultra-lightweight task management library targeting embedded platforms, whether Arduino-based or otherwise. It’s pure C++, so it generally doesn’t matter. The emphasis is on rapid interrupt response, which is, we know, critical to a good embedded design. Implemented as a single header file that is less than 350 lines long, it is not hard to understand (provided you know C++ templates!) and easy to extend to add needed features as they arise. A small code base also makes debugging easier. A vital point of the project is the management of delay routines. Instead of a plain delay(), you write a custom version that executes your short execution task queue, so no time is wasted. Of course, you have to plan how the tasks are grouped and scheduled and all the data flow issues, but that’s all the stuff you’d be doing anyway.

The GitHub project page has some clear examples and is the place to grab that header file to try it yourself. When you really need an RTOS, you have a lot of choices, mostly costing money, but here’s our guide to two popular open source projects: FreeRTOS and ChibiOS. Sometimes, an RTOS isn’t enough, so we design our own full OS from scratch — sort of.

C++ Design Patterns For Low-Latency Applications

With performance optimizations seemingly having lost their relevance in an era of ever-increasing hardware performance, there are still many good reasons to spend some time optimizing code. In a recent preprint article by [Paul Bilokon] and [Burak Gunduz] of the Imperial College London the focus is specifically on low-latency patterns that are relevant for applications such as high-frequency trading (HFT). In HFT the small margins are compensated for by churning through absolutely massive volumes of trades, all of which relies on extremely low latency to gain every advantage. Although FPGA-based solutions are very common in HFT due their low-latency, high-parallelism, C++ is the main language being used beyond FPGAs.

Although many of the optimizations listed in the paper are quite obvious, such as prewarming the CPU caches, using constexpr, loop unrolling and use of inlining, other patterns are less obvious, such as hotpath versus coldpath. This overlaps with the branch reduction pattern, with both patterns involving the separation of commonly and rarely executed code (like error handling and logging), improving use of the CPU’s caches and preventing branch mispredictions, as the benchmarks (using Google Benchmark) clearly demonstrates. All design patterns can also be found in the GitHub repository.

Other interesting tidbits are the impact of signed and unsigned comparisons, mixing floating point datatypes and of course lock-free programming using a ring buffer design. Only missing from this list appears to be aligned vs unaligned memory accesses and zero-copy optimizations, but those should be easy additions to implement and test next to the other optimizations in this paper.

C Compiler Exists Entirely In Vim

8cc.vim is a C compiler that exists as pure Vimscript. Is it small? It sure is! How about fast? Absolutely not! Efficient? Also no. But does it work and is it neat? You betcha!

Ever typed :wq to write the buffer and exit in Vim? When you do that, you’re using Vimscript. Whenever one enters command mode : in Vim, one is in fact using a live Vimscript interpreter. That’s the space in which this project exists and does its magic. Given enough time, anyway.

Vimscript itself was created by [Bram Moolenaar] in 1991. The idea was to execute batches of vim commands programmatically. It’s been used for a variety of purposes since then.

8cc is a lightweight C compiler that has been supplanted by chibicc, but that doesn’t matter much because as author [rhysd] admits, this is really just a fun concept project more than anything. It may take twenty minutes or more to compile “hello world”, but doing it entirely from within Vim is a trip.

The Performance Impact Of C++’s `final` Keyword For Optimization

In the world of software development the term ‘optimization’ is generally reason for experienced developers to start feeling decidedly nervous, especially when a feature is marked as an ‘easy and free optimization’. The final keyword introduced in C++11 is one of such features. It promises a way to speed up object-oriented code by omitting the vtable call indirection by marking a class or member function as – unsurprisingly – final, meaning that it cannot be inherited from or overridden. Inspired by this promise, [Benjamin Summerton] figured that he’d run a range of benchmarks to see what performance uplift he’d get on his ray tracing project.

To be as thorough as possible, the tests were run on three different systems, including 64-bit Intel and AMD systems, as well as on Apple Silicon (M1). For the compilers various versions of GCC (12.x, 13.x), as well as Clang  (15, 17) and MSVC (17) were employed, with rather interesting results for final versus no final tests. Clang was probably the biggest surprise, as with the keyword added, performance with Clang-generated code absolutely tanked. MSVC was a mixed bag, as were the GCC versions other than GCC 13.2 on AMD Ryzen, which saw a bump of a few percent faster.

Ultimately, it seems that there’s no free lunch as usual, and adding final to your code falls distinctly under ‘only use it if you know what you’re doing’. As things stand, the resulting behavior seems wildly inconsistent.

Processes, Threads, And… Fibers?

You’ve probably heard of multithreaded programs where a single process can have multiple threads of execution. But here is yet another layer of creating multitasking programs known as a fiber. [A Graphics Guy] lays it out in a lengthy but well-done post. There are examples for both x64 and arm64, although the post mainly focuses on x64 for Windows. However, the ideas will apply anywhere.

In the old days, there was a CPU and when your program ran on it, it was in control. But that’s wasteful, so software quickly moved to where many programs could share the CPU simultaneously. Then, as that got overloaded, computers got more CPUs. Most operating systems have the idea of a process, which is a program that thinks it is in complete control, but it is really sharing the CPU with other processes. The problem arises when you want to have multiple “little” programs that cooperate. Processes are not really supposed to know about one another and, if they do, there’s usually some heavy-weight communication mechanism allowing them to talk.

Continue reading “Processes, Threads, And… Fibers?”

QSPICE Picks Up Where LTSpice Left Us

[Mike Engelhardt] is a name that should be very familiar to the hardcore electronics nerd. [Mike] is the developer responsible for LTSpice, which is quite likely the most widely used spice-compatible simulator in the free software domain. When you move away from digital electronics and the comfort of software with its helpful IDEs and toolchains, and dip a wary toe into the murky grey waters of analog or power electronics, LTSpice is your best friend. And, like all best friends, it’s a bit quirky, but it always has your back. Sadly, LTSpice development seems to have stalled some years ago, but luckily for us [Mike] has been busy on the successor, QSpice, under the watchful eye of Qorvo.

It does look in its early stages, but from a useability point of view, it’s much improved over LTSpice. Performance is excellent (based on this scribe’s limited testing while mobile.) Gone (thankfully!) is the uncommon verb-noun usage paradigm — replaced with a more usual cut-n-paste flow. Visually it still kind of looks like LTspice in places, but nicer with a clear and uncluttered design that gets straight to the point. Internally, the simulation engine has improved in speed and accuracy, as well as adding native support for modern semiconductor types, such as wide bandgap materials like SiC. Noted is that this updated software has a particular emphasis on power integrity and noise analysis, which are sticky problems that have a big impact on modern high-power systems.

Continue reading “QSPICE Picks Up Where LTSpice Left Us”

In Praise Of RPN (with Python Or C)

HP calculators, slide rules, and Forth all have something in common: reverse polish notation or RPN. Admittedly, slide rules don’t really have RPN, but you work problems on them the same way you do with an RPN calculator. For whatever reason, RPN didn’t really succeed in the general marketplace, and you might wonder why it was ever a thing. The biggest reason is that RPN is very easy to implement compared to working through proper algebraic, or infix, notation. In addition, in the early years of computers and calculators, you didn’t have much to work with, and people were used to using slide rules, so having something that didn’t take a lot of code that matched how users worked anyway was a win-win.

What is RPN?

If you haven’t encountered RPN before, it is an easy way to express math without ambiguity. For example, what’s 5 + 3 * 6?  It’s 23 and not 48. By order of operations you know that you have to multiply before you add, even if you wrote down the multiplication second. You have to read through the whole equation before you can get started with math, and if you want to force the other result, you’ll need parentheses.

With RPN, there is no ambiguity depending on secret rules or parentheses, nor is there any reason to remember things unnecessarily. For instance, to calculate our example you have to read all the way through once to figure out that you have to multiply first, then you need to remember that is pending and add the 5. With RPN, you go left to right, and every time you see an operator, you act on it and move on. With RPN, you would write 3 6 * 5 +.

While HP calculators were the most common place to encounter RPN, it wasn’t the only place. Friden calculators had it, too. Some early computers and calculators supported it but didn’t name it. Some Soviet-era calculators used it, too, including the famous Elektronika B3-34, which was featured in a science fiction story in a Soviet magazine aimed at young people in 1985. The story set problems that had to be worked on the calculator.

Continue reading “In Praise Of RPN (with Python Or C)”