We are always amused that we can run emulations or virtual copies of yesterday’s computers on our modern computers. In fact, there is so much power at your command now that you can run, say, a DOS emulator on a Windows virtual machine under Linux, even though the resulting DOS prompt would probably still perform better than an old 4.77 MHz PC. Remember when you could get calculators that ran BASIC? Well, [Calculator Clique] shows off BASIC running on a decidedly modern HP Prime calculator. The trick? It’s running under Python. Check it out in the video below.
Think about it. The HP Prime has an ARM processor inside. In addition to its normal programming system, it has Micropython as an option. So that’s one interpreter. Then PyBasic has a nice classic Basic interpreter that runs on Python. We’ve even ported it to one or two of the Hackaday Superconference badges.
An important aspect in software engineering is the ability to distinguish between premature, unnecessary, and necessary optimizations. A strong case can be made that the initial design benefits massively from optimizations that prevent well-known issues later on, while unnecessary optimizations are those simply do not make any significant difference either way. Meanwhile ‘premature’ optimizations are harder to define, with Knuth’s often quoted-out-of-context statement about these being ‘the root of all evil’ causing significant confusion.
We can find Donald Knuth’s full quote deep in the 1974 articleStructured Programming with go to Statements, which at the time was a contentious optimization topic. On page 268, along with the cited quote, we see that it’s a reference to making presumed optimizations without understanding their effect, and without a clear picture of which parts of the program really take up most processing time. Definitely sound advice.
And unlike back in the 1970s we have today many easy ways to analyze application performance and to quantize bottlenecks. This makes it rather inexcusable to spend more time today vilifying the goto statement than to optimize one’s code with simple techniques like zero-copy and binary message formats.
The cache hierarchy of the 2008 Intel Nehalem x86 architecture. (Source: Intel)
Writing good, performant code depends strongly on an understanding of the underlying hardware. This is especially the case in scenarios like those involving embarrassingly parallel processing, which at first glance ought to be a cakewalk. With multiple threads doing their own thing without having to nag the other threads about anything it seems highly doubtful that even a novice could screw this up. Yet as [Keifer] details in a recent video on so-called false sharing, this is actually very easy, for a variety of reasons.
With a multi-core and/or multi-processor system each core has its own local cache that contains a reflection of the current values in system RAM. If any core modifies its cached data, this automatically invalidates the other cache lines, resulting in a cache miss for those cores and forcing a refresh from system RAM. This is the case even if the accessed data isn’t one that another core was going to use, with an obvious impact on performance. As cache lines are a contiguous block of data with a size and source alignment of 64 bytes on x86, it’s easy enough to get some kind of overlap here.
The worst case scenario as detailed and demonstrated using the Google Benchmark sample projects, involves a shared global data structure, with a recorded hundred times reduction in performance. Also noticeable is the impact on scaling performance, with the cache misses becoming more severe with more threads running.
A less obvious cause of performance loss here is due to memory alignment and how data fits in the cache lines. Making sure that your data is aligned in e.g. data structures can prevent more unwanted cache invalidation events. With most applications being multi-threaded these days, it’s a good thing to not only know how to diagnose false sharing issues, but also how to prevent them.
Named roo_display for unclear reasons, the library is Arduino-compatible, and suits a wide range of ESP32 boards out in the wild. It’s intended for use with common SPI-attached display controllers, like the ILI9341, SSD1327, ST7789, and more. It’s performance-oriented, without skimping on feature set. It’s got all kinds of fonts in different weights and sizes, and a tool for importing more. It can do all kinds of shapes if you want to manually draw your UI elements, or you can simply have it display JPEGs, PNGs, or raw image data from PROGMEM if you so desire. If you’re hoping to create a touch interface, it can handle that too. There’s even a companion library for doing more complex work under the name roo_windows.
If you’re looking to create a simple and responsive interface, this might be the library for you. Of course, there are others out there too, like the Adafruit GFX library which we’ve featured before. You could even go full VGA if you wanted, and end up with something that looks straight out of Windows 3.1. Meanwhile, if you’re cooking up your own graphics code for the popular microcontroller platform, you should probably let us know on the tipsline!
In today’s Chromed-up world it can be hard to remember an era where browsers could be extended with not just extensions, but also with plugins. Although for those of us who use traditional Netscape-based browsers like Pale Moon the use of plugins has never gone away, for the rest of the WWW’s users their choice has been limited to increasingly more restrictive browser extensions, with Google’s Manifest V3 taking the cake.
Although most browsers stopped supporting plugins due to “security concerns”, this did nothing to address the need for executing code in the browser faster than the sedate snail’s pace possible with JavaScript, or the convenience of not having to port native code to JavaScript in the first place. This led to various approaches that ultimately have culminated in the WebAssembly (WASM) standard, which comes with its own set of issues and security criticisms.
Other than Netscape’s Plugin API (NPAPI) being great for making even 1990s browsers ready for 2026, there are also very practical reasons why WASM and JavaScript-based approaches simply cannot do certain basic things.
[neos-builder] wrote in to let us know about their innovation: the HORUS Framework — Hybrid Optimized Robotics Unified System — a production-grade robotics framework built in Rust for real-time performance and memory safety.
This is a batteries included system which aims to have everything you might need available out of the box. [neos-builder] said their vision is to create a robotics framework that is “thick” as a whole (we can’t avoid this as the tools, drivers, etc. make it impossible to be slim and fit everyone’s needs), but modular by choice.
[neos-builder] goes on to say that HORUS aims to provide developers an interface where they can focus on writing algorithms and logic, not on setting up their environments and solving configuration issues and resolving DLL hell. With HORUS instead of writing one monolithic program, you build independent nodes, connected by topics, which are run by a scheduler. If you’d like to know more the documentation is extensive.
The list of features is far too long for us to repeat here, but one cool feature in addition to the real-time performance and modular design that jumped out at us was this system’s ability to process six million messages per second, sustained. That’s a lot of messages! Another neat feature is the system’s ability to “freeze” the environment, thereby assuring everyone on the team is using the same version of included components, no more “but it works on my machine!” And we should probably let you know that Python integration is a feature, connected by shared-memory inter-process communication (IPC).
If you’re gonna be a hacker eventually you’re gonna have to write code. And if you write code eventually you’re gonna have to deal with concurrency. Concurrency is what we call it when parts of our program run at the same time. That could be because of something fairly straightforward, like multiple threads, or multiple processes; or something a little more complicated such as event loops, asynchronous or non-blocking I/O, interrupts and signal handlers, re-entrancy, co-routines / fibers / green threads, job queues, DMA and hardware level concurrency, speculative or out-of-order execution at CPU-level, time-sharing on single-core systems, or parallel execution on multi-core systems. There are just so many ways to get tied up with concurrency.
In this video from [Core Dumped] we learn about The ’80s Algorithm to Avoid Race Conditions (and Why It Failed). This video explains what a race condition looks like and talks through what the critical section is and approaches to protecting it. This video introduces an old approach to protect the critical section first invented in 1981 known as Peterson’s solution, but then goes on to explain how Peterson’s solution is no longer reliable as much has changed since the 1980s, particularly compilers will reorganize instructions and CPUs may run code out of order. So there is no free lunch and if you have to deal with concurrency you’re going to want some kind of support for a mutex of some type. Your programming language and its standard library probably have various types of locks available and if not you can use something like flock (also available as a syscall, to complement the POSIX fcntl), which may be available on your platform.