It’s Time To Make A Major Change To D-Bus On Linux

Although flying well under the radar of the average Linux user, D-Bus has been an integral part of Linux distributions for nearly two decades and counting. Rather than using faster point-to-point interprocess communication via a Unix socket or such, an IPC bus allows for IP communication in a bus-like manner for convenience reasons. D-Bus replaced a few existing IPC buses in the Gnome and KDE desktop environments and became since that time the de-facto standard. Which isn’t to say that D-Bus is well-designed or devoid of flaws, hence attracting the ire of people like [Vaxry] who recently wrote an article on why D-Bus should die and proposes using hyprwire instead.

The broader context is provided by [Brodie Robertson], whose video adds interesting details, such as that Arch Linux wrote its own D-Bus implementation rather than use the reference one. Then there’s CVE-2018-19358 pertaining to the security risk of using an unlocked keyring on D-Bus, as any application on said bus can read the contents. The response by the Gnome developers responsible for D-Bus was very Wayland-like in that they dismissed the CVE as ‘works as designed’.

One reason why the proposed hyperwire/hyprtavern IPC bus would be better is on account of having actual security permissions, real validation of messages and purportedly also solid documentation. Even after nearly twenty years the documentation for D-Bus consists mostly out of poorly documented code, lots of TODOs in ‘documentation’ files along with unfinished drafts. Although [Vaxry] isn’t expecting this hyprwire alternative to be picked up any time soon, it’s hoped that it’ll at least make some kind of improvement possible, rather than Linux limping on with D-Bus for another few decades.

Continue reading “It’s Time To Make A Major Change To D-Bus On Linux”

Xcc700: Self-Hosted C Compiler For The ESP32/Xtensa

With two cores at 240 MHz and about 8.5 MB of non-banked RAM if you’re using the right ESP32-S3 version, this MCU seems at least in terms of specifications to be quite the mini PC. Obviously this means that it should be capable of self-hosting its compiler, which is exactly what [Valentyn Danylchuk] did with the xcc700 C compiler project.

Targeting the Xtensa Lx7 ISA of the ESP32-S3, this is a minimal C compiler that outputs relocatable ELF binaries. These binaries can subsequently be run with for example the ESP-IDF-based elf_loader component. Obviously, this is best done on an ESP32 platform that has PSRAM, unless your binary fits within the few hundred kB that’s left after all the housekeeping and communication stacks are loaded.

The xcc700 compiler is currently very minimalistic, omitting more complex loop types as well as long and floating point types, for starters. There’s no optimization of the final code either, but considering that it’s 700 lines of code just for a PoC, there seems to be still plenty of room for improvement.

Abusing X86 SIMD Instructions To Optimize PlayStation 3 Emulation

Key to efficient hardware emulation is an efficient mapping to the underlying CPU’s opcodes. Here one is free to target opcodes that may or may not have been imagined for that particular use. For emulators like the RPCS3 PlayStation 3 emulator this has led to some interesting mappings, as detailed in a video by [Whatcookie].

It’s important to remember here that the Cell processor in the PlayStation 3 is a bit of an odd duck, using a single regular PowerPC core (PPE) along with multiple much more simple co-processors called synergistic processing elements (SPEs) all connected with a high-speed bus. A lot of the focus with Cell was on floating point vector – i.e. SIMD – processing, which is part of why for a while the PlayStation 3 was not going to have a dedicated GPU.

As a result, it makes perfect sense to do creative mapping between the Cell’s SIMD instructions and those of e.g. SSE and AVX, even if Intel removing AVX-512 for a while caused major headaches. Fortunately some of those reappeared in AVX2.

The video goes through a whole range of Cell-specific instructions, how they work, and what x86 SIMD instructions they were mapped to and why. The SUBD instruction for example is mapped to VPDPBUSD as well as VDBPSADBW in AVX-512, the latter of which mostly targets things like video encoding. In the end it’s the result that matters, even if it also shows why the Cell processor was so interesting for high-performance compute clusters back in the day.

Continue reading “Abusing X86 SIMD Instructions To Optimize PlayStation 3 Emulation”

Surviving The RAM Apocalypse With Software Optimizations

To the surprise of almost nobody, the unprecedented build-out of datacenters and the equipping of them with servers for so-called ‘AI’ has led to a massive shortage of certain components. With random access memory (RAM) being so far the most heavily affected and with storage in the form of HDDs and SSDs not far behind, this has led many to ask the question of how we will survive the coming months, years, decades, or however-long the current AI bubble will last.

One thing is already certain, and that is that we will have to make our current computer systems last longer, and forego simply tossing in more sticks of RAM in favor of doing more with less. This is easy to imagine for those of us who remember running a full-blown Windows desktop system on a sub-GHz x86 system with less than a GB of RAM, but might require some adjustment for everyone else.

In short, what can us software developers do differently to make a hundred MB of RAM stretch further, and make a GB of storage space look positively spacious again?

Continue reading “Surviving The RAM Apocalypse With Software Optimizations”

Libxml2 Narrowly Avoids Becoming Unmaintained

In an excellent example of one of the most overused XKCD images, the libxml2 library has for a little while lost its only maintainer, with [Nick Wellnhofer] making good on his plan to step down by the end of the year.

XKCD's dependency model
Modern-day infrastructure, as visualized by XKCD. (Credit: Randall Munroe)

While this might not sound like a big deal, the real scope of this problem is rather profound. Not only is libxml2 part of GNOME, it’s also used as dependency by a huge number of projects, including web browsers and just about anything that processes XML or XSLT. Not having a maintainer in the event that a fresh, high-risk CVE pops up would obviously be less than desirable.

As for why [Nick] stepped down, it’s a long story. It starts in the early 2000s when the original author [Daniel Veillard] decided he no longer had time for the project and left [Nick] in charge. It should be said here that both of them worked as volunteers on the project, for no financial compensation. This when large companies began to use projects like libxml2 in their software, and were happy to send bug reports. Beyond a single Google donation it was effectively unpaid work that required a lot of time spent on researching and processing potential security flaws sent in.

Of note is that when such a security report comes in, the expectation is that you as a volunteer software developer drop everything you’re working on and figure out the cause, fix and patched-by-date alongside filing a CVE. This rather than you getting sent a merge request or similar with an accompanying test case. Obviously these kind of cases seems to have played a major role in making [Nick] burn out on maintaining both libxml2 and libxslt.

Fortunately for the project two new developers have stepped up to take over as maintainers, but it should be obvious that such churn is not a good sign. It also highlights the central problem with the conflicting expectations of open source software being both totally free in a monetary fashion and unburdened with critical bugs. This is unfortunately an issue that doesn’t seem to have an easy solution, with e.g. software bounties resulting in mostly a headache.

Bare Metal STM32: Increasing The System Clock And Running Dhrystone

When you start an STM32 MCU with its default configuration, its CPU will tick along at a leisurely number of cycles on the order of 8 to 16 MHz, using the high-speed internal (HSI) clock source as a safe default to bootstrap from. After this phase, we are free to go wild with the system clock, as well as the various clock sources that are available beyond the HSI.

Increasing the system clock doesn’t just affect the CPU either, but also affects the MCU’s internal buses via its prescalers and with it the peripherals like timers on that bus. Hence it’s essential to understand the clock fabric of the target MCU. This article will focus on the general case of increasing the system clock on an STM32F103 MCU from the default to the maximum rated clock speed using the relevant registers, taking into account aspects like Flash wait states and the APB and AHB prescalers.

Although the Dhrystone benchmark is rather old-fashioned now, it’ll be used to demonstrate the difference that a faster CPU makes, as well as how complex accurately benchmarking is. Plus it’s just interesting to get an idea of how a lowly Cortex-M3 based MCU compares to a once top-of-the line Intel Pentium 90 CPU.

Continue reading “Bare Metal STM32: Increasing The System Clock And Running Dhrystone”

Debugging The AMD GPU

Although Robert F. Kennedy gets the credit for popularizing it, George Bernard Shaw said: “Some men see things as they are and say, ‘Why?’ I dream of things that never were and say, ‘Why not?'” Well, [Hadz] didn’t wonder why there weren’t many GPU debuggers. Instead, [Hadz] decided to create one.

It wasn’t the first; he found some blog posts by [Marcell Kiss] that helped, and that led to a series of experiments you’ll enjoy reading about. Plus, don’t miss the video below that shows off a live demo.

It seems that if you don’t have an AMD GPU, this may not be directly useful. But it is still a fascinating peek under the covers of a modern graphics card. Ever wonder how to interact with a video card without using something like Vulkan? This post will tell you how.

Writing a debugger is usually a tricky business anyway. Working with the strange GPU architecture makes it even stranger. Traps let you gain control, but implementing features like breakpoints and single-stepping isn’t simple.

We’ve used things like CUDA and OpenCL, but we haven’t been this far down in the weeds. At least, not yet. CUDA, of course, is specific to NVIDIA cards, isn’t it?

Continue reading “Debugging The AMD GPU”