The Performance Impact Of C++’s `final` Keyword For Optimization

In the world of software development the term ‘optimization’ is generally reason for experienced developers to start feeling decidedly nervous, especially when a feature is marked as an ‘easy and free optimization’. The final keyword introduced in C++11 is one of such features. It promises a way to speed up object-oriented code by omitting the vtable call indirection by marking a class or member function as – unsurprisingly – final, meaning that it cannot be inherited from or overridden. Inspired by this promise, [Benjamin Summerton] figured that he’d run a range of benchmarks to see what performance uplift he’d get on his ray tracing project.

To be as thorough as possible, the tests were run on three different systems, including 64-bit Intel and AMD systems, as well as on Apple Silicon (M1). For the compilers various versions of GCC (12.x, 13.x), as well as Clang  (15, 17) and MSVC (17) were employed, with rather interesting results for final versus no final tests. Clang was probably the biggest surprise, as with the keyword added, performance with Clang-generated code absolutely tanked. MSVC was a mixed bag, as were the GCC versions other than GCC 13.2 on AMD Ryzen, which saw a bump of a few percent faster.

Ultimately, it seems that there’s no free lunch as usual, and adding final to your code falls distinctly under ‘only use it if you know what you’re doing’. As things stand, the resulting behavior seems wildly inconsistent.

Processes, Threads, And… Fibers?

You’ve probably heard of multithreaded programs where a single process can have multiple threads of execution. But here is yet another layer of creating multitasking programs known as a fiber. [A Graphics Guy] lays it out in a lengthy but well-done post. There are examples for both x64 and arm64, although the post mainly focuses on x64 for Windows. However, the ideas will apply anywhere.

In the old days, there was a CPU and when your program ran on it, it was in control. But that’s wasteful, so software quickly moved to where many programs could share the CPU simultaneously. Then, as that got overloaded, computers got more CPUs. Most operating systems have the idea of a process, which is a program that thinks it is in complete control, but it is really sharing the CPU with other processes. The problem arises when you want to have multiple “little” programs that cooperate. Processes are not really supposed to know about one another and, if they do, there’s usually some heavy-weight communication mechanism allowing them to talk.

Continue reading “Processes, Threads, And… Fibers?”

QSPICE Picks Up Where LTSpice Left Us

[Mike Engelhardt] is a name that should be very familiar to the hardcore electronics nerd. [Mike] is the developer responsible for LTSpice, which is quite likely the most widely used spice-compatible simulator in the free software domain. When you move away from digital electronics and the comfort of software with its helpful IDEs and toolchains, and dip a wary toe into the murky grey waters of analog or power electronics, LTSpice is your best friend. And, like all best friends, it’s a bit quirky, but it always has your back. Sadly, LTSpice development seems to have stalled some years ago, but luckily for us [Mike] has been busy on the successor, QSpice, under the watchful eye of Qorvo.

It does look in its early stages, but from a useability point of view, it’s much improved over LTSpice. Performance is excellent (based on this scribe’s limited testing while mobile.) Gone (thankfully!) is the uncommon verb-noun usage paradigm — replaced with a more usual cut-n-paste flow. Visually it still kind of looks like LTspice in places, but nicer with a clear and uncluttered design that gets straight to the point. Internally, the simulation engine has improved in speed and accuracy, as well as adding native support for modern semiconductor types, such as wide bandgap materials like SiC. Noted is that this updated software has a particular emphasis on power integrity and noise analysis, which are sticky problems that have a big impact on modern high-power systems.

Continue reading “QSPICE Picks Up Where LTSpice Left Us”

In Praise Of RPN (with Python Or C)

HP calculators, slide rules, and Forth all have something in common: reverse polish notation or RPN. Admittedly, slide rules don’t really have RPN, but you work problems on them the same way you do with an RPN calculator. For whatever reason, RPN didn’t really succeed in the general marketplace, and you might wonder why it was ever a thing. The biggest reason is that RPN is very easy to implement compared to working through proper algebraic, or infix, notation. In addition, in the early years of computers and calculators, you didn’t have much to work with, and people were used to using slide rules, so having something that didn’t take a lot of code that matched how users worked anyway was a win-win.

What is RPN?

If you haven’t encountered RPN before, it is an easy way to express math without ambiguity. For example, what’s 5 + 3 * 6?  It’s 23 and not 48. By order of operations you know that you have to multiply before you add, even if you wrote down the multiplication second. You have to read through the whole equation before you can get started with math, and if you want to force the other result, you’ll need parentheses.

With RPN, there is no ambiguity depending on secret rules or parentheses, nor is there any reason to remember things unnecessarily. For instance, to calculate our example you have to read all the way through once to figure out that you have to multiply first, then you need to remember that is pending and add the 5. With RPN, you go left to right, and every time you see an operator, you act on it and move on. With RPN, you would write 3 6 * 5 +.

While HP calculators were the most common place to encounter RPN, it wasn’t the only place. Friden calculators had it, too. Some early computers and calculators supported it but didn’t name it. Some Soviet-era calculators used it, too, including the famous Elektronika B3-34, which was featured in a science fiction story in a Soviet magazine aimed at young people in 1985. The story set problems that had to be worked on the calculator.

Continue reading “In Praise Of RPN (with Python Or C)”

C++17’s Useful Features For Embedded Systems

Although the world of embedded software development languages seem to span somewhere between ASM and C89 all the way to MicroPython, there is a lot to be said for a happy medium between ease of development and features that makes the software more robust without adding overhead or bloat to the final firmware image.

This is where C++ has objectively many advantages over even C99, and as [Çağlayan Dökme] argues in a recent blog post C++17 adds many developer critter comforts to C++98 and the more recent C++11 C++14 standards.

First stepping back a generation (technically two, with C++20 also being a thing already), the addition of binary literals (e.g. 0b1010'1100) in C++14 and the expanded use of constexpr is addressed, with the latter foreshadowing C++17’s increased focus on compile time optimizations. A new attribute in C++17 that is part of this is [[nodiscard]], which when added before to the return type of a function or method requires the return value to be used in some manner, much like with functions in Ada (contrasted with procedures).

As [Çağlayan] notes, the biggest strength of compile-time checks is that it can save a lot of deploy-test-fix round-trips, with the total number of issues caught after deployment that could have been caught during compilation ideally being zero. Here C++17 streamlines the static_assert() mechanism and simplifies using if constexpr to instantiate code depending on compile-time conditions. Beyond compile-time optimizations there are a few other niceties, such as C++17 guaranteeing copy elision (return value optimization) when an object is returned directly, which is a welcome feature in hard real-time environments.

With today even MCUs having enough grunt to run multi-threaded applications and potentially firmware compiled from a many-thousand LoC codebase, picking a programming language that assists the developer with such an arduous task is very important, with Ada being the primary choice for high-reliability embedded platforms, but C++ along with C enjoying the most widespread (free) compiler support. Even if C++ isn’t supported on every single MCU out there (8051-based and most PIC MCUs mostly), whenever it is an option, it’s a pretty solid choice, especially with knowledge of these new language features.

Encoding NTSC With Your Hands Tied

Generally, when trying to implement some protocol, you are constrained by your hardware and time. But for someone like [EMMIR], that’s not enough. For example, NTSC-CRT is a video signal encoding/decoding simulator with no hardware acceleration, floating point math, or third-party libraries. Just basic C.

While NTSC has officially gone dark in America, people still make their own ATTiny-powered transmitters. NTSC is a bit of a strange standard and is sometimes referred to as never-twice-the-same color, but it does produce a distinct look.

That look is what [EMMIR] was going for. It encodes a message in a ppm format into NTSC and then back in ppm with some configurable noise. It can do this in real-time as an effect in [EMMIR’s] engine or on a rendered image via a CLI. It looks incredible, and there’s something very satisfying. There’s a video after the break showing off the effect. The code is pretty short and easy to read.

Continue reading “Encoding NTSC With Your Hands Tied”

Here’s A Plain C/C++ Implementation Of AI Speech Recognition, So Get Hackin’

[Georgi Gerganov] recently shared a great resource for running high-quality AI-driven speech recognition in a plain C/C++ implementation on a variety of platforms. The automatic speech recognition (ASR) model is fully implemented using only two source files and requires no dependencies. As a result, the high-quality speech recognition doesn’t involve calling remote APIs, and can run locally on different devices in a fairly straightforward manner. The image above shows it running locally on an iPhone 13, but it can do more than that.

Implementing a robust speech transcription that runs locally on a variety of devices is much easier with [Georgi]’s port of OpenAI’s Whisper.
[Georgi]’s work is a port of OpenAI’s Whisper model, a remarkably-robust piece of software that does a truly impressive job of turning human speech into text. Whisper is easy to set up and play with, but this port makes it easier to get the system working in other ways. Having such a lightweight implementation of the model means it can be more easily integrated over a variety of different platforms and projects.

The usual way that OpenAI’s Whisper works is to feed it an audio file, and it spits out a transcription. But [Georgi] shows off something else that might start giving hackers ideas: a simple real-time audio input example.

By using a tool to stream audio and feed it to the system every half-second, one can obtain pretty good (sort of) real-time results! This of course isn’t an ideal method, but the robustness and accuracy of Whisper is such that the results look pretty great nevertheless.

You can watch a quick demo of that in the video just under the page break. If it gives you some ideas, head over to the project’s GitHub repository and get hackin’!

Continue reading “Here’s A Plain C/C++ Implementation Of AI Speech Recognition, So Get Hackin’”