Optimizing Software With Zero-Copy And Other Techniques

An important aspect in software engineering is the ability to distinguish between premature, unnecessary, and necessary optimizations. A strong case can be made that the initial design benefits massively from optimizations that prevent well-known issues later on, while unnecessary optimizations are those simply do not make any significant difference either way. Meanwhile ‘premature’ optimizations are harder to define, with Knuth’s often quoted-out-of-context statement about these being ‘the root of all evil’ causing significant confusion.

We can find Donald Knuth’s full quote deep in the 1974 article Structured Programming with go to Statements, which at the time was a contentious optimization topic. On page 268, along with the cited quote, we see that it’s a reference to making presumed optimizations without understanding their effect, and without a clear picture of which parts of the program really take up most processing time. Definitely sound advice.

And unlike back in the 1970s we have today many easy ways to analyze application performance and to quantize bottlenecks. This makes it rather inexcusable to spend more time today vilifying the goto statement than to optimize one’s code with simple techniques like zero-copy and binary message formats.

Continue reading “Optimizing Software With Zero-Copy And Other Techniques”

Project Fail: Cracking A Laptop BIOS Password Using AI

Whenever you buy used computers there is a risk that they come with unpleasant surprises that are not of the insect variant. From Apple hardware that is iCloud-locked with the original owner MIA to PCs that have BIOS passwords, some of these are more severe than others. In the case of BIOS passwords, these tend to be more of an annoyance that’s easily fixed by clearing the CMOS memory, but this isn’t always the case as [Casey Bralla] found with a former student-issued HP ProBook laptop purchased off Facebook Marketplace.

Maybe it’s because HP figured that locking down access to the BIOS is essential on systems that find their way into the hands of bored and enterprising students, but these laptops write the encrypted password and associated settings to a separate Flash memory. Although a master key purportedly exists, HP’s policy here is to replace the system board. Further, while there are some recovery options that do not involve reflashing this Flash memory, they require answers to recovery questions.

This led [Casey] to try brute-force cracking, starting with a Rust-based project on GitHub that promised much but failed to even build. Undeterred, he tasked the Claude AI to write a Python script to do the brute-forcing via the Windows-based HP BIOS utility. The chatbot was also asked to generate multiple lists of unique passwords to try that might be candidates based on some human guesses.

Six months later of near-continuous attempts at nine seconds per try, this method failed to produce a hit, but at least the laptop can still be used, just without BIOS access. This may require [Casey] to work up the courage to do some hardware hacking and erase that pesky UEFI BIOS administrator password, proving at least that apparently it’s fairly good BIOS security.

Building A Carousel Autosampler

A common task in a laboratory setting is that of sampling, where a bit of e.g. liquid has to be sampled from a series of containers. Doing this by hand is possible, but tedious, ergo an autosampler can save a lot of time and tedium. Being not incredibly complex devices that have a lot in common with e.g. FDM 3D printers and CNC machines, it makes perfect sense to build one yourself, as [Markus Bindhammer] of Marb’s Lab on YouTube has done.

The specific design that [Markus] went for uses a sample carousel that can hold up to 30 bottles of 20 mL each. An ATmega-based board forms the brain of the machine, which can operate either independently or be controlled via I2C or serial. The axes and carousel are controlled by three stepper motors, each of which is driven by a TB6600 microstep driver.

Why this design is a time saver should be apparent, as you can load the carousel with bottles and have the autosampler handle the work over the course of however long the entire process takes instead of tying up a human. Initially the autosampler will be used for the synthesis of cadmium-selenium quantum dots, before it will be put to work for an HPLC/spectrometer project.

Although [Markus] intends this to be an open hardware and software project, it will take a bit longer to get all the files and documentation organized. Until then we will have to keep manually sampling, or use the video as the construction tutorial.

Continue reading “Building A Carousel Autosampler”

Looking At A Real Fake Raspberry Pi RP2040 Board

Since the RP2040 microcontroller is available as a stand-alone component, it’s easy enough for third parties to churn out their own variations — or outright clones of — the Raspberry Pi Pico. Thus we end up with for example AliExpress sellers offering their own versions that can be significantly cheaper than the genuine article. The ones that [electronupdate] obtained for a test and decapping session cost just $2.25 a pop.

RP2 B0 stepping imprinted on the die shot.

As can be seen in the top image, the board from AliExpress misses the Raspberry Pi logo on the silkscreen for obvious reasons, but otherwise appears to feature an identical component layout. The QSPI Flash IC is marked on the die as BY250156FS, identifying it as a Boya part.

Niggles about flash ROM quality aside, what’s perhaps most interesting about this teardown is what eagle-eyed commentators spotted on the die shot of the RP2040. Although on the MCU the laser markings identify the RP2040 as a B2 stepping, the die clearly identifies it as an ‘RP2 B0’ part, meaning B0 stepping. This can be problematic when you try to use the USB functionality due to hardware USB bugs in the B0 and B1 steppings.

As they say, caveat emptor.

Continue reading “Looking At A Real Fake Raspberry Pi RP2040 Board”

Making Code A Hundred Times Slower With False Sharing

The cache hierarchy of the 2008 Intel Nehalem x86 architecture. (Source: Intel)
The cache hierarchy of the 2008 Intel Nehalem x86 architecture. (Source: Intel)

Writing good, performant code depends strongly on an understanding of the underlying hardware. This is especially the case in scenarios like those involving embarrassingly parallel processing, which at first glance ought to be a cakewalk. With multiple threads doing their own thing without having to nag the other threads about anything it seems highly doubtful that even a novice could screw this up. Yet as [Keifer] details in a recent video on so-called false sharing, this is actually very easy, for a variety of reasons.

With a multi-core and/or multi-processor system each core has its own local cache that contains a reflection of the current values in system RAM. If any core modifies its cached data, this automatically invalidates the other cache lines, resulting in a cache miss for those cores and forcing a refresh from system RAM. This is the case even if the accessed data isn’t one that another core was going to use, with an obvious impact on performance. As cache lines are a contiguous block of data with a size and source alignment of 64 bytes on x86, it’s easy enough to get some kind of overlap here.

The worst case scenario as detailed and demonstrated using the Google Benchmark sample projects, involves a shared global data structure, with a recorded hundred times reduction in performance. Also noticeable is the impact on scaling performance, with the cache misses becoming more severe with more threads running.

A less obvious cause of performance loss here is due to memory alignment and how data fits in the cache lines. Making sure that your data is aligned in e.g. data structures can prevent more unwanted cache invalidation events. With most applications being multi-threaded these days, it’s a good thing to not only know how to diagnose false sharing issues, but also how to prevent them.

Continue reading “Making Code A Hundred Times Slower With False Sharing”

Pushing China’s EAST Tokamak Past The Greenwald Density Limit

Getting a significant energy return from tokamak-based nuclear fusion reactors depends for a large part on plasma density, but increasing said density is tricky, as beyond a certain point the plasma transitions back from the much more stable high-confinement mode (H-mode) into L-mode. Recently Chinese researchers have reported that they managed to increase the plasma density in the EAST tokamak beyond the previously known upper Greenwald Density Limit (GDL), as this phenomenon is known.

We covered these details with nuclear fusion reactors in great detail last year, noting the importance of plasma edge stability, as this causes tokamak wall erosion as well as loss of energy. The EAST tokamak (HT-7U) is a superconducting tokamak that was upgraded and resumed operations in 2014, featuring a 1.85 meter major radius and 7.5 MW heating power. As a tokamak the issue of plasma and edge stability are major concerns, even in H-mode, requiring constant intervention.

Continue reading “Pushing China’s EAST Tokamak Past The Greenwald Density Limit”

When Electricity Doesn’t Take The Shortest Path

Everyone knows that the path of least resistance is the path that will always be taken, be it by water, electricity or the feet of humans. This is where the PCB presented by [ElectrArc240] on YouTube is rather confusing, as it demonstrates two similarly sized traces, one of which is much shorter than the other, yet the current opts to travel via the much longer trace. If you were to measure this PCB between each path, the shorter path has the lowest resistance at 0.44 Ω while the longer path is 1.44 Ω. Did the laws of physics break down here?

Of course, this is just a trick question, as the effective resistance for an electrical circuit isn’t just about ohmic resistance. Instead the relevant phrasing here is ‘path of least impedance‘, which is excellently demonstrated here using this PCB. Note that its return path sneaks on the back side along the same path as the long path on the front. To this is added a 1 MHz high current source that demonstrates the impact of alternating current, with reactance combining with the resistance.

Although for direct current it’s fair to say that impedance is the equivalent of resistance, once the inductance of a trace has to be taken into account – as in the case of AC and high-frequency signaling – the much higher inductance of the short path means that now the long path is actually the shortest.

When you are doing some impedance matching in your favorite EDA software while implementing an Ethernet RMII link or similar, this is basically part of the process, with higher frequencies requiring ever more stringent mechanisms to keep both sides happy. At some point any stray signals from nearby traces and components become a factor, never mind the properties of the PCB material.

Continue reading “When Electricity Doesn’t Take The Shortest Path”