Abusing X86 SIMD Instructions To Optimize PlayStation 3 Emulation

Key to efficient hardware emulation is an efficient mapping to the underlying CPU’s opcodes. Here one is free to target opcodes that may or may not have been imagined for that particular use. For emulators like the RPCS3 PlayStation 3 emulator this has led to some interesting mappings, as detailed in a video by [Whatcookie].

It’s important to remember here that the Cell processor in the PlayStation 3 is a bit of an odd duck, using a single regular PowerPC core (PPE) along with multiple much more simple co-processors called synergistic processing elements (SPEs) all connected with a high-speed bus. A lot of the focus with Cell was on floating point vector – i.e. SIMD – processing, which is part of why for a while the PlayStation 3 was not going to have a dedicated GPU.

As a result, it makes perfect sense to do creative mapping between the Cell’s SIMD instructions and those of e.g. SSE and AVX, even if Intel removing AVX-512 for a while caused major headaches. Fortunately some of those reappeared in AVX2.

The video goes through a whole range of Cell-specific instructions, how they work, and what x86 SIMD instructions they were mapped to and why. The SUBD instruction for example is mapped to VPDPBUSD as well as VDBPSADBW in AVX-512, the latter of which mostly targets things like video encoding. In the end it’s the result that matters, even if it also shows why the Cell processor was so interesting for high-performance compute clusters back in the day.

Continue reading “Abusing X86 SIMD Instructions To Optimize PlayStation 3 Emulation”

Silicon-Based MEMS Resonators Offer Accuracy In Little Space

Currently quartz crystal-based oscillators are among the most common type of clock source in electronics, providing a reasonably accurate source in a cheap and small package. Unfortunately for high accuracy applications, atomic clocks aren’t quite compact enough to fit into the typical quartz-based temperature-compensated crystal oscillators (TCXOs) and even quartz-based solutions are rather large. The focus therefore has been on developing doped silicon MEMS solutions that can provide a similar low-drift solution as the best compensated quartz crystal oscillators, with the IEEE Spectrum magazine recently covering one such solution.

Part of the DARPA H6 program, [Everestus Ezike] et al. developed a solution that was stable to ±25 parts per billion (ppb) over the course of eight hours. This can be contrasted with a commercially available TCXO like the Microchip MX-503, which boasts a frequency stability of ±30 ppb.

Higher accuracy is achievable by swapping the TCXO for an oven-controlled crystal oscillator (OCXO), with the internal temperature of the oscillator not compensated for, but rather controlled with an active heater. There are many existing OCXOs that offer down to sub-1 ppb stability, albeit in quite a big package, such as the OX-171 with a sizable 28×38 mm footprint.

With a MEMS silicon-based oscillator in OXCO configuration [Yutao Xu] et al. were able to achieve a frequency stability of ±14 ppb, which puts it pretty close to the better quartz-based oscillators, yet within a fraction of the space. As these devices mature, we may see them eventually compete with even the traditional OCXO offerings, though the hyperbolic premise of the IEEE Spectrum article of them competing with atomic clocks should be taken with at least a few kilograms of salt.

Thanks to [anfractuosity] for the tip.

Exploring Modern SID Chip Substitutes

The SIDKick Pico installed on a breadboard. (Credit: Ben Eater)
The SIDKick Pico installed on a breadboard. (Credit: Ben Eater)

Despite the Commodore 64 having been out of production for probably longer than many Hackaday readers have been alive, its SID audio chip remains a very popular subject of both retrocomputing and modern projects. Consequently a range of substitutes have been developed over the decades, all of which seek to produce the audio quality of one or more variants of the SID. This raises the question of which of these to pick when at first glance they seem so similar. Fret not, for [Ben Eater] did an entire video on comparing some modern SID substitutes and his thoughts on them.

First is the SIDKick Pico, which as the name suggests uses a Raspberry Pi Pico board for its Cortex-M0+ MCU. This contrasts with the other option featured in the video, in the form of the STM32F410-based ARMSID.

While the SIDKick Pico looks good on paper, it comes with a number of different configurations, some with an additional DAC, which can be confusing. Because of how it is stacked together with the custom PCB on which the Pi Pico is mounted, it’s also pretty wide and tall, likely leading to fitment issues. It also doesn’t work as a drop-in solution by default, requiring soldering to use the SID’s normal output pins. Unfortunately this led to intense distortion in [Ben]’s testing leading him to give up on this.

Meanwhile the ARMSID is about as boring as drop-in replacements get. After [Ben] got the ARMSID out of its packaging, noted that it is sized basically identical to the original SID and inserted it into the breadboard, it then proceeded to fire right up with zero issues.

It’s clear that the SIDKick Pico comes with a lot of features and such, making it great for tinkering. However, if all you want is a SID-shaped IC that sounds like a genuine SID chip, then the ARMSID is a very solid choice.

Thanks to [Mark Stevens] for the tip.

Continue reading “Exploring Modern SID Chip Substitutes”

Be Wary Of Flash-less ESP32-C3 Super Mini Boards

Everyone loves tiny microcontroller boards, and the ESP32-C3 Super Mini boards are no exception. Unfortunately if you just casually stroll over to your nearest online purveyor of such goods to purchase a bunch of them, you’re likely to be disappointed. The reason for this is, as explained in a video by [Hacker University] that these boards are equipped with any of the variants of the ESP32-C3. The worst offender here is probably the version with the ESP32-C3 without further markings, as this one has no built-in Flash for program storage.

Beyond that basic MCU version we can see the other versions clearly listed in the Espressif ESP32-C3 datasheet. Of these, the FN4 is already listed as EOL, the FH4AZ as NRND, leaving only the FH4 and FH4X with the latter as ‘recommended’ as the newest chip revision. Here the F stands for  built-in Flash with the next character for its temperature rating, e.g. H for ‘High’. Next is the amount of Flash in MB, so always 4 MB for all but the Flash-less variant.

Identifying this information from some online listing is anything but easy unless the seller is especially forthcoming. The chip markings show this information on the third row, as can be seen in the top image, but relying solely on a listing’s photos is rather sketchy. If you do end up with a Flash-less variant, you can still wire up an external Flash chip yourself, but obviously this is probably not the intended use case.

As always, caveat emptor.

Continue reading “Be Wary Of Flash-less ESP32-C3 Super Mini Boards”

Surviving The RAM Apocalypse With Software Optimizations

To the surprise of almost nobody, the unprecedented build-out of datacenters and the equipping of them with servers for so-called ‘AI’ has led to a massive shortage of certain components. With random access memory (RAM) being so far the most heavily affected and with storage in the form of HDDs and SSDs not far behind, this has led many to ask the question of how we will survive the coming months, years, decades, or however-long the current AI bubble will last.

One thing is already certain, and that is that we will have to make our current computer systems last longer, and forego simply tossing in more sticks of RAM in favor of doing more with less. This is easy to imagine for those of us who remember running a full-blown Windows desktop system on a sub-GHz x86 system with less than a GB of RAM, but might require some adjustment for everyone else.

In short, what can us software developers do differently to make a hundred MB of RAM stretch further, and make a GB of storage space look positively spacious again?

Continue reading “Surviving The RAM Apocalypse With Software Optimizations”

Libxml2 Narrowly Avoids Becoming Unmaintained

In an excellent example of one of the most overused XKCD images, the libxml2 library has for a little while lost its only maintainer, with [Nick Wellnhofer] making good on his plan to step down by the end of the year.

XKCD's dependency model
Modern-day infrastructure, as visualized by XKCD. (Credit: Randall Munroe)

While this might not sound like a big deal, the real scope of this problem is rather profound. Not only is libxml2 part of GNOME, it’s also used as dependency by a huge number of projects, including web browsers and just about anything that processes XML or XSLT. Not having a maintainer in the event that a fresh, high-risk CVE pops up would obviously be less than desirable.

As for why [Nick] stepped down, it’s a long story. It starts in the early 2000s when the original author [Daniel Veillard] decided he no longer had time for the project and left [Nick] in charge. It should be said here that both of them worked as volunteers on the project, for no financial compensation. This when large companies began to use projects like libxml2 in their software, and were happy to send bug reports. Beyond a single Google donation it was effectively unpaid work that required a lot of time spent on researching and processing potential security flaws sent in.

Of note is that when such a security report comes in, the expectation is that you as a volunteer software developer drop everything you’re working on and figure out the cause, fix and patched-by-date alongside filing a CVE. This rather than you getting sent a merge request or similar with an accompanying test case. Obviously these kind of cases seems to have played a major role in making [Nick] burn out on maintaining both libxml2 and libxslt.

Fortunately for the project two new developers have stepped up to take over as maintainers, but it should be obvious that such churn is not a good sign. It also highlights the central problem with the conflicting expectations of open source software being both totally free in a monetary fashion and unburdened with critical bugs. This is unfortunately an issue that doesn’t seem to have an easy solution, with e.g. software bounties resulting in mostly a headache.

Why Chopped Carbon Fiber In FDM Prints Is A Contaminant

A lot of claims have been made about the purported benefits of adding chopped carbon fiber to FDM filaments, but how many of these claims are actually true? In the case of PLA at least, the [I built a thing] channel on YouTube makes a convincing case that for PLA filament, the presence of chopped CF can be considered a contaminant that weakens the part.

Using the facilities of the University of Basel for its advanced imaging gear, the PLA-CF parts were subjected to both scanning electron microscope (SEM) and Micro CT imaging. The SEM images were performed on the fracture surfaces of parts that were snapped to see what this revealed about the internal structure. From this, it becomes apparent that the chopped fibers distribute themselves both inside and between the layers, with no significant adherence between the PLA polymer and the CF. There is also evidence for voids created by the presence of the CF.

To confirm this, an intact PLA-CF print was scanned using a Micro CT scanner over 13 hours. This confirmed the SEM findings, in that the voids were clearly visible, as was the lack of integration of the CF into the polymer. This latter point shouldn’t be surprising, as the thermal coefficient of PLA is much higher than that of the roughly zero-to-negative of CF. This translates into a cooling PLA part shrinking around the CF, thus creating the voids.

Continue reading “Why Chopped Carbon Fiber In FDM Prints Is A Contaminant”