C++ Is 45 Years Old. [Stroustrup] Says You Still Don’t Get It!

We were surprised when we read a post from C++ creator [Bjarne Stroustrup] that reminded us that C++ is 45 years old. His premise is that C++ is robust and flexible and by following some key precepts, you can avoid problems.

We don’t disagree, but C++ is much like its progenitor, C, in that it doesn’t really force you to color inside the lines. We like that, though. But it does mean that people will go off and do things the way they want to do it, for any of a number of good and bad reasons.

Continue reading “C++ Is 45 Years Old. [Stroustrup] Says You Still Don’t Get It!”

UScope: A New Linux Debugger And Not A GDB Shell, Apparently

[Jim Colabro] is a little underwhelmed with the experience of low-level debugging of Linux applications using traditional debuggers such as GDB and LLDB. These programs have been around for a long time, developing alongside Linux and other UNIX-like OSs, and are still solidly in the CLI domain.  Fed up with the lack of data structure support and these tools’ staleness and user experience, [Jim] has created UScope, a new debugger written from scratch with no code from the existing projects.

GBD, in particular, has quite a steep learning curve once you dig into its more advanced features. Many people side-step this learning curve by running GDB within Visual Studio or some other modern IDE, but it is still the same old debugger core at the end of the day. [Jim] gripes that existing debuggers don’t support modern data structures commonly used and have poor customizability. It would be nice, for example, to write a little code, and have the debugger render a data structure graphically to aid visualisation of a problem being investigated. We know that GDB at least can be customised with Python to create application-specific pretty printers, but perhaps [Jim] has bigger plans?

Continue reading “UScope: A New Linux Debugger And Not A GDB Shell, Apparently”

Better C Strings, Simply

If you program in C, strings are just in your imagination. What you really have is a character pointer, and we all agree that a string is every character from that point up until one of the characters is zero. While that’s simple and useful, it is also the source of many errors. For example, writing a 32-byte string to a 16-byte array or failing to terminal a string with a zero byte. [Thasso] has been experimenting with a different way to represent strings that is still fairly simple but helps keep things straight.

Like many other languages, this setup uses counted strings and string buffers. You can read and write to a string buffer, but strings are read-only. In either case, there is a length for the contents and, in the case of the buffer, a length for the entire buffer.

Continue reading “Better C Strings, Simply”

ANTIRTOS: No RTOS Needed

Embedded programming is a tricky task that looks straightforward to the uninitiated, but those with a few decades of experience know differently. Getting what you want to work predictably or even fit into the target can be challenging. When you get to a certain level of complexity, breaking code down into multiple tasks can become necessary, and then most of us will reach for a real-time operating system (RTOS), and the real fun begins. [Aleksei Tertychnyi] clearly understands such issues but instead came up with an alternative they call ANTIRTOS.

The idea behind the project is not to use an RTOS at all but to manage tasks deterministically by utilizing multiple queues of function pointers. The work results in an ultra-lightweight task management library targeting embedded platforms, whether Arduino-based or otherwise. It’s pure C++, so it generally doesn’t matter. The emphasis is on rapid interrupt response, which is, we know, critical to a good embedded design. Implemented as a single header file that is less than 350 lines long, it is not hard to understand (provided you know C++ templates!) and easy to extend to add needed features as they arise. A small code base also makes debugging easier. A vital point of the project is the management of delay routines. Instead of a plain delay(), you write a custom version that executes your short execution task queue, so no time is wasted. Of course, you have to plan how the tasks are grouped and scheduled and all the data flow issues, but that’s all the stuff you’d be doing anyway.

The GitHub project page has some clear examples and is the place to grab that header file to try it yourself. When you really need an RTOS, you have a lot of choices, mostly costing money, but here’s our guide to two popular open source projects: FreeRTOS and ChibiOS. Sometimes, an RTOS isn’t enough, so we design our own full OS from scratch — sort of.

C++ Design Patterns For Low-Latency Applications

With performance optimizations seemingly having lost their relevance in an era of ever-increasing hardware performance, there are still many good reasons to spend some time optimizing code. In a recent preprint article by [Paul Bilokon] and [Burak Gunduz] of the Imperial College London the focus is specifically on low-latency patterns that are relevant for applications such as high-frequency trading (HFT). In HFT the small margins are compensated for by churning through absolutely massive volumes of trades, all of which relies on extremely low latency to gain every advantage. Although FPGA-based solutions are very common in HFT due their low-latency, high-parallelism, C++ is the main language being used beyond FPGAs.

Although many of the optimizations listed in the paper are quite obvious, such as prewarming the CPU caches, using constexpr, loop unrolling and use of inlining, other patterns are less obvious, such as hotpath versus coldpath. This overlaps with the branch reduction pattern, with both patterns involving the separation of commonly and rarely executed code (like error handling and logging), improving use of the CPU’s caches and preventing branch mispredictions, as the benchmarks (using Google Benchmark) clearly demonstrates. All design patterns can also be found in the GitHub repository.

Other interesting tidbits are the impact of signed and unsigned comparisons, mixing floating point datatypes and of course lock-free programming using a ring buffer design. Only missing from this list appears to be aligned vs unaligned memory accesses and zero-copy optimizations, but those should be easy additions to implement and test next to the other optimizations in this paper.

C Compiler Exists Entirely In Vim

8cc.vim is a C compiler that exists as pure Vimscript. Is it small? It sure is! How about fast? Absolutely not! Efficient? Also no. But does it work and is it neat? You betcha!

Ever typed :wq to write the buffer and exit in Vim? When you do that, you’re using Vimscript. Whenever one enters command mode : in Vim, one is in fact using a live Vimscript interpreter. That’s the space in which this project exists and does its magic. Given enough time, anyway.

Vimscript itself was created by [Bram Moolenaar] in 1991. The idea was to execute batches of vim commands programmatically. It’s been used for a variety of purposes since then.

8cc is a lightweight C compiler that has been supplanted by chibicc, but that doesn’t matter much because as author [rhysd] admits, this is really just a fun concept project more than anything. It may take twenty minutes or more to compile “hello world”, but doing it entirely from within Vim is a trip.

The Performance Impact Of C++’s `final` Keyword For Optimization

In the world of software development the term ‘optimization’ is generally reason for experienced developers to start feeling decidedly nervous, especially when a feature is marked as an ‘easy and free optimization’. The final keyword introduced in C++11 is one of such features. It promises a way to speed up object-oriented code by omitting the vtable call indirection by marking a class or member function as – unsurprisingly – final, meaning that it cannot be inherited from or overridden. Inspired by this promise, [Benjamin Summerton] figured that he’d run a range of benchmarks to see what performance uplift he’d get on his ray tracing project.

To be as thorough as possible, the tests were run on three different systems, including 64-bit Intel and AMD systems, as well as on Apple Silicon (M1). For the compilers various versions of GCC (12.x, 13.x), as well as Clang  (15, 17) and MSVC (17) were employed, with rather interesting results for final versus no final tests. Clang was probably the biggest surprise, as with the keyword added, performance with Clang-generated code absolutely tanked. MSVC was a mixed bag, as were the GCC versions other than GCC 13.2 on AMD Ryzen, which saw a bump of a few percent faster.

Ultimately, it seems that there’s no free lunch as usual, and adding final to your code falls distinctly under ‘only use it if you know what you’re doing’. As things stand, the resulting behavior seems wildly inconsistent.