C++ Design Patterns For Low-Latency Applications

With performance optimizations seemingly having lost their relevance in an era of ever-increasing hardware performance, there are still many good reasons to spend some time optimizing code. In a recent preprint article by [Paul Bilokon] and [Burak Gunduz] of the Imperial College London the focus is specifically on low-latency patterns that are relevant for applications such as high-frequency trading (HFT). In HFT the small margins are compensated for by churning through absolutely massive volumes of trades, all of which relies on extremely low latency to gain every advantage. Although FPGA-based solutions are very common in HFT due their low-latency, high-parallelism, C++ is the main language being used beyond FPGAs.

Although many of the optimizations listed in the paper are quite obvious, such as prewarming the CPU caches, using constexpr, loop unrolling and use of inlining, other patterns are less obvious, such as hotpath versus coldpath. This overlaps with the branch reduction pattern, with both patterns involving the separation of commonly and rarely executed code (like error handling and logging), improving use of the CPU’s caches and preventing branch mispredictions, as the benchmarks (using Google Benchmark) clearly demonstrates. All design patterns can also be found in the GitHub repository.

Other interesting tidbits are the impact of signed and unsigned comparisons, mixing floating point datatypes and of course lock-free programming using a ring buffer design. Only missing from this list appears to be aligned vs unaligned memory accesses and zero-copy optimizations, but those should be easy additions to implement and test next to the other optimizations in this paper.

Docker-Powered Remote Gaming With Games On Whales

Cloud gaming services allow even relatively meager devices like set top boxes and cheap Chromebooks play the latest and greatest titles. It’s not perfect of course — latency is the number one issue as the player’s controller inputs need to be sent out to the server —  but if you’ve got a fast enough connection it’s better than nothing. Interested in experimenting with the tech on your own terms? The open source Games on Whales project is here to make that a reality.

As you might have guessed from the name, Games on Whales uses Linux and Docker as core components in its remote gaming system. With the software installed on a headless server, multiple users can create virtual desktop environments on the same machine, with each spawning as a separate process on the host computer. This means that all of the hardware of the host can be shared without needing to do anything complicated like setting up GPU pass-through. The main Docker container can spin up more containers as needed.

Of course there will obviously be limits to what any given hardware configuration will be able to support in terms of number of concurrent users and the demands of each stream. But for someone who wants to host a server for their friends or something even simpler like not having to put a powerful gaming PC in the living room, this is a real game-changer. For those not up to speed on Docker yet, we recently featured a guide on getting started with this powerful tool since it does take some practice to wrap one’s mind around at first.

Misconceptions About Loops, Or: Static Code Analysis Is Hard

When thinking about loops in programming languages, they often get simplified down to a conditions section and a body, but this belies the dizzying complexity that emerges when considering loop edge cases within the context of static analysis. A paper titled Misconceptions about Loops in C by [Martin Brain] and colleagues as presented to SOAP 2024 conference goes through a whole list of false assumptions when it comes to loops, including for languages other than C. Perhaps most interesting is the conclusion that these ‘edge cases’ are in fact a lot more common than generally assumed, courtesy of how creative languages and their users can be when writing their code, with or without dragging in the meta-language of C’s preprocessor.

Assumptions like loop equivalence can fall apart when considering the CFG ( control flow graph) interpretation versus a parse tree one where the former may e.g. merge loops. There are also doozies like assuming that the loop body will always exist, that the first instruction(s) in a loop are always the entry point, and the horrors of estimating loop exits in the context of labels, inlined functions and more. Some languages have specific loop control flow features that differ from C (e.g. Python’s for/else and Ada’s loop), all of which affect a static analysis.

Ultimately, writing a good static analysis tool is hard, and there are plenty of cases where it’s likely to trip up and give an invalid result. A language which avoids ambiguity (e.g. Ada) helps immensely here, but for other languages it helps to write your code as straightforward as possible to give the static analysis tool a fighting chance, or just get really good at recognizing confused static analysis tool noises.

(Heading image: Control flow merges can create multiple loop entry
edges (Credit: Martin Brand, et al., SOAP 2024) )

Fixed Point Math Exposed

If you are used to writing software for modern machines, you probably don’t think much about computing something like one divided by three. Modern computers handle floating point quite well. However, in constrained systems, there is a trap you should be aware of. While modern compilers are happy to let you use and abuse floating point numbers, the hardware is often woefully slow. It also tends to eat up lots of resources. So what do you do? Well, as [Low Byte Productions] explains, you can opt for fixed-point math.

In theory, the idea is simple. Just put an arbitrary decimal point in your integers. So, for example, if we have two numbers, say 123 and 456, we could remember that we really mean 1.23 and 4.56. Adding, then, becomes trivial since 123+456=579, which is, of course, 5.79.

Continue reading “Fixed Point Math Exposed”

PostmarketOS Now Boots On Over 250 Devices

Every year, as consumers gobble up the latest Android devices, more old, but perfectly serviceable, units end up collecting dust in drawers. Or worse, they end up getting tossed in the trash. One of the most promising tools we have to help keep these older devices useful is postmarketOS, a full-fledged Linux distribution that provides a flexible and up-to-date software environment on devices that might otherwise be stuck with some old and unsupported version of Google’s mobile operating system.

As of the latest update on the postmarketOS blog, the team has announced an exciting milestone: over 250 devices can now boot the stable release of the OS.

Now to be clear, not all devices will be fully functional. In fact, the blog post clarifies that some of them only barely boot. But it’s progress, and now that these semi-supported devices aren’t hidden behind a development version of the OS, it means more folks will be able to put them to use.

For example, if you want to turn your old smartphone into a low-energy headless webserver, it doesn’t really matter if its display, touchscreen, or speakers are supported. You just need it to boot into Linux and fire up an SSH server so you can get in and start working.

But support for new devices is just one of the additions in this new v24.06 release. The blog post also points out several notable software upgrades, including the move to the 6.x branch of KDE Plasma Mobile. This brings with it a long list of improvements and changes, including a rewritten homescreen with enhanced customization options. If you prefer a more minimal GUI, don’t worry. This new release also updates Sxmo, which provides a menu-driven interface for both touch screens and hardware controls.

Among the newly supported devices is a generic x86_64 image that should work on a wide array of PCs. While obviously there’s no shortage of Linux distros you could run on your old computer, being able to install postmarketOS on it is definitely helpful for development purposes. There’s also a new Tegra ARMv7 target which brings a number of new devices into the fold, such as the Google Nexus 7, and Microsoft Surface RT.

Looking to run postmarketOS on your own hardware? The best way to start is to check the Devices page and see how many of those old gadgets you’ve got collecting dust in a drawer are compatible.

Forsp: A Forth & Lisp Hybrid Lambda Calculus Language

In the world of lambda calculus programming languages there are many ways to express the terms, which is why we ended up with such an amazing range of programming languages, even if most trace their roots back to ALGOL. Of the more unique (and practical) languages, Lisp and Forth probably range near the top, but what if you were to smudge both together? That’s what [xorvoid] did and it resulted in the gracefully titled Forsp programming language. Unsurprisingly it got a very warm and enthusiastic reception over at Hacker News.

While keeping much of Lisp-isms, the Forth part consists primarily out of it being very small and easy to implement, as demonstrated by the C-based reference implementation. It also features a Forth-like value/operand stack and function application. Also interesting is Forsp using call-by-push-value (CBPV), which is quite different from call-by-value (CBV) and call-by-name (CBN), which may give some advantages if you can wrap your mind around the concept.

Even if practicality is debatable, Forsp is another delightful addition to the list of interesting lambda calculus demonstrations which show that the field is anything but static or boring.

Programming Ada: Records And Containers For Organized Code

Writing code without having some way to easily organize sets of variables or data would be a real bother. Even if in the end you could totally do all of the shuffling of bits and allocating in memory by yourself, it’s much easier when the programming language abstracts all of that housekeeping away. In Ada you generally use a few standard types, ranging from records (equivalent to structs in C) to a series of containers like vectors and maps. As with any language, there are some subtle details about how all of these work, which is where the usage of these types in the Sarge project will act as an illustrative example.

In this project’s Ada code, a record is used for information about command line arguments (flag names, values, etc.) with these argument records stored in a vector. In addition, a map is created that links the names of these arguments, using a string as the key, to the index of the corresponding record in the vector. Finally, a second vector is used to store any text fragments that follow the list of arguments provided on the command line. This then provides a number of ways to access the record information, either sequentially in the arguments vector, or by argument (flag) name via the map.

Continue reading “Programming Ada: Records And Containers For Organized Code”