A 64-bit X86 Bootloader From Scratch

For most people, you turn on your computer, and it starts the operating system. However, the reality is much more complex as [Thasso] discovered. Even modern x86 chips start in 16-bit real mode and there is a bit of fancy footwork required to shift to modern protected mode with full 64-bit support. Want to see how? [Thasso] shows us the ropes.

Nowadays, it is handy to develop such things because you don’t have to use real hardware. An emulator like QEMU will suffice. If you know assembly language, the process is surprisingly simple, although there is a lot of nuance and subtlety. The biggest task is setting up appropriate paging tables to control the memory mapping. In real mode, segments have access to fixed 64 K blocks of memory unless you use some tricks. But in protected mode, segments define blocks of memory that can be very small or cover the entire address space. These segments define areas of memory even though it is possible to set segments to cover all memory and — sort of — ignore them. You still have to define them for the switch to protected mode.

In the bad old days, you had more reason to worry about this if you were writing a DOS Extender or using some tricks to get access to more memory. But still good to know if you are rolling your own operating system. Why do the processors still boot into real mode? Good question.

C++ Design Patterns For Low-Latency Applications

With performance optimizations seemingly having lost their relevance in an era of ever-increasing hardware performance, there are still many good reasons to spend some time optimizing code. In a recent preprint article by [Paul Bilokon] and [Burak Gunduz] of the Imperial College London the focus is specifically on low-latency patterns that are relevant for applications such as high-frequency trading (HFT). In HFT the small margins are compensated for by churning through absolutely massive volumes of trades, all of which relies on extremely low latency to gain every advantage. Although FPGA-based solutions are very common in HFT due their low-latency, high-parallelism, C++ is the main language being used beyond FPGAs.

Although many of the optimizations listed in the paper are quite obvious, such as prewarming the CPU caches, using constexpr, loop unrolling and use of inlining, other patterns are less obvious, such as hotpath versus coldpath. This overlaps with the branch reduction pattern, with both patterns involving the separation of commonly and rarely executed code (like error handling and logging), improving use of the CPU’s caches and preventing branch mispredictions, as the benchmarks (using Google Benchmark) clearly demonstrates. All design patterns can also be found in the GitHub repository.

Other interesting tidbits are the impact of signed and unsigned comparisons, mixing floating point datatypes and of course lock-free programming using a ring buffer design. Only missing from this list appears to be aligned vs unaligned memory accesses and zero-copy optimizations, but those should be easy additions to implement and test next to the other optimizations in this paper.

Docker-Powered Remote Gaming With Games On Whales

Cloud gaming services allow even relatively meager devices like set top boxes and cheap Chromebooks play the latest and greatest titles. It’s not perfect of course — latency is the number one issue as the player’s controller inputs need to be sent out to the server —  but if you’ve got a fast enough connection it’s better than nothing. Interested in experimenting with the tech on your own terms? The open source Games on Whales project is here to make that a reality.

As you might have guessed from the name, Games on Whales uses Linux and Docker as core components in its remote gaming system. With the software installed on a headless server, multiple users can create virtual desktop environments on the same machine, with each spawning as a separate process on the host computer. This means that all of the hardware of the host can be shared without needing to do anything complicated like setting up GPU pass-through. The main Docker container can spin up more containers as needed.

Of course there will obviously be limits to what any given hardware configuration will be able to support in terms of number of concurrent users and the demands of each stream. But for someone who wants to host a server for their friends or something even simpler like not having to put a powerful gaming PC in the living room, this is a real game-changer. For those not up to speed on Docker yet, we recently featured a guide on getting started with this powerful tool since it does take some practice to wrap one’s mind around at first.

Misconceptions About Loops, Or: Static Code Analysis Is Hard

When thinking about loops in programming languages, they often get simplified down to a conditions section and a body, but this belies the dizzying complexity that emerges when considering loop edge cases within the context of static analysis. A paper titled Misconceptions about Loops in C by [Martin Brain] and colleagues as presented to SOAP 2024 conference goes through a whole list of false assumptions when it comes to loops, including for languages other than C. Perhaps most interesting is the conclusion that these ‘edge cases’ are in fact a lot more common than generally assumed, courtesy of how creative languages and their users can be when writing their code, with or without dragging in the meta-language of C’s preprocessor.

Assumptions like loop equivalence can fall apart when considering the CFG ( control flow graph) interpretation versus a parse tree one where the former may e.g. merge loops. There are also doozies like assuming that the loop body will always exist, that the first instruction(s) in a loop are always the entry point, and the horrors of estimating loop exits in the context of labels, inlined functions and more. Some languages have specific loop control flow features that differ from C (e.g. Python’s for/else and Ada’s loop), all of which affect a static analysis.

Ultimately, writing a good static analysis tool is hard, and there are plenty of cases where it’s likely to trip up and give an invalid result. A language which avoids ambiguity (e.g. Ada) helps immensely here, but for other languages it helps to write your code as straightforward as possible to give the static analysis tool a fighting chance, or just get really good at recognizing confused static analysis tool noises.

(Heading image: Control flow merges can create multiple loop entry
edges (Credit: Martin Brand, et al., SOAP 2024) )

Fixed Point Math Exposed

If you are used to writing software for modern machines, you probably don’t think much about computing something like one divided by three. Modern computers handle floating point quite well. However, in constrained systems, there is a trap you should be aware of. While modern compilers are happy to let you use and abuse floating point numbers, the hardware is often woefully slow. It also tends to eat up lots of resources. So what do you do? Well, as [Low Byte Productions] explains, you can opt for fixed-point math.

In theory, the idea is simple. Just put an arbitrary decimal point in your integers. So, for example, if we have two numbers, say 123 and 456, we could remember that we really mean 1.23 and 4.56. Adding, then, becomes trivial since 123+456=579, which is, of course, 5.79.

Continue reading “Fixed Point Math Exposed”

PostmarketOS Now Boots On Over 250 Devices

Every year, as consumers gobble up the latest Android devices, more old, but perfectly serviceable, units end up collecting dust in drawers. Or worse, they end up getting tossed in the trash. One of the most promising tools we have to help keep these older devices useful is postmarketOS, a full-fledged Linux distribution that provides a flexible and up-to-date software environment on devices that might otherwise be stuck with some old and unsupported version of Google’s mobile operating system.

As of the latest update on the postmarketOS blog, the team has announced an exciting milestone: over 250 devices can now boot the stable release of the OS.

Now to be clear, not all devices will be fully functional. In fact, the blog post clarifies that some of them only barely boot. But it’s progress, and now that these semi-supported devices aren’t hidden behind a development version of the OS, it means more folks will be able to put them to use.

For example, if you want to turn your old smartphone into a low-energy headless webserver, it doesn’t really matter if its display, touchscreen, or speakers are supported. You just need it to boot into Linux and fire up an SSH server so you can get in and start working.

But support for new devices is just one of the additions in this new v24.06 release. The blog post also points out several notable software upgrades, including the move to the 6.x branch of KDE Plasma Mobile. This brings with it a long list of improvements and changes, including a rewritten homescreen with enhanced customization options. If you prefer a more minimal GUI, don’t worry. This new release also updates Sxmo, which provides a menu-driven interface for both touch screens and hardware controls.

Among the newly supported devices is a generic x86_64 image that should work on a wide array of PCs. While obviously there’s no shortage of Linux distros you could run on your old computer, being able to install postmarketOS on it is definitely helpful for development purposes. There’s also a new Tegra ARMv7 target which brings a number of new devices into the fold, such as the Google Nexus 7, and Microsoft Surface RT.

Looking to run postmarketOS on your own hardware? The best way to start is to check the Devices page and see how many of those old gadgets you’ve got collecting dust in a drawer are compatible.

Forsp: A Forth & Lisp Hybrid Lambda Calculus Language

In the world of lambda calculus programming languages there are many ways to express the terms, which is why we ended up with such an amazing range of programming languages, even if most trace their roots back to ALGOL. Of the more unique (and practical) languages, Lisp and Forth probably range near the top, but what if you were to smudge both together? That’s what [xorvoid] did and it resulted in the gracefully titled Forsp programming language. Unsurprisingly it got a very warm and enthusiastic reception over at Hacker News.

While keeping much of Lisp-isms, the Forth part consists primarily out of it being very small and easy to implement, as demonstrated by the C-based reference implementation. It also features a Forth-like value/operand stack and function application. Also interesting is Forsp using call-by-push-value (CBPV), which is quite different from call-by-value (CBV) and call-by-name (CBN), which may give some advantages if you can wrap your mind around the concept.

Even if practicality is debatable, Forsp is another delightful addition to the list of interesting lambda calculus demonstrations which show that the field is anything but static or boring.