The Intel 8088 And 8086 Processor’s Instruction Prefetch Circuitry

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. (Credit: Ken Shirriff)
The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip’s single metal layer; the polysilicon and silicon are underneath. (Credit: Ken Shirriff)

Cache prefetching is what allows processors to have data and/or instructions ready for use in a fast local cache rather than having to wait for a fetch request to trickle through to system RAM and back again. The Intel 8088  (and its big brother 8086) processor was among the first microprocessors to implement (instruction) prefetching in hardware, which [Ken Shirriff] has analyzed based on die images of this famous processor. This follows last year’s deep-dive into the 8086’s prefetching hardware, with (unsurprisingly) many similarities between these two microprocessors, as well as a few differences that are mostly due to the 8088’s cut-down 8-bit data bus.

While the 8086 has 3 16-bit slots in the instruction prefetcher the 8088 gets 4 slots, each 8-bit. The prefetching hardware is part of the Bus Interface Unit (BIU), which effectively decouples the actual processor (Execution Unit, or EU) from the system RAM. While previous MPUs would be fully deterministic, with instructions being loaded from RAM and subsequently executed, the 8086 and 8088’s prefetching meant that such assumptions no longer were true. The added features in the BIU also meant that the instruction pointer (IP) and related registers moved to the BIU, while the ringbuffer logic around the queue had to somehow keep the queueing and pointer offsets into RAM working correctly.

Even though these days CPUs have much more complicated, multi-level caches that are measured in kilobytes and megabytes, it’s fascinating to see where it all began, with just a few bytes and relatively straight-forward hardware logic that you easily follow under a microscope.

Is Your Mental Model Of Bash Pipelines Wrong?

[Michael Lynch] encountered a strange situation. Why was compiling then running his program nearly 10x faster than just running the program by itself? [Michael] ran into this issue while benchmarking a programming project, pared it down to its essentials for repeatability and analysis, and discovered it highlighted an incorrect mental model of how bash pipelines worked.

Here’s the situation. The first thing [Michael]’s pared-down program does is start a timer. Then it simply reads and counts some bytes from stdin, then prints out how long it took for that to happen. When running the test program in the following way, it takes about 13 microseconds.

$ echo '00010203040506070809' | xxd -r -p | zig build run -Doptimize=ReleaseFast
bytes: 10
execution time: 13.549µs

When running the (already-compiled) program directly, execution time swells to 162 microseconds.

$ echo '00010203040506070809' | xxd -r -p | ./zig-out/bin/count-bytes
bytes: 10
execution time: 162.195µs

Again, the only difference between zig build run and ./zig-out/bin/count-bytes is that the first compiles the code, then immediately runs it. The second simply runs the compiled program. Continue reading “Is Your Mental Model Of Bash Pipelines Wrong?”

Saving A Clock Radio With An LM8562

Smart phones have taken the place of a lot of different devices especially as they get more and more powerful. GPS, music and video player, email, and of course a phone are all functions tied up in these general-purpose devices. Another casualty of the smart phone revolution is the humble bedside alarm clock as its radio, alarm, and timekeeping functionalities are also provided by modern devices. [zst123] has a sentimental attachment to the one he used in the 00s, though, and set about restoring it to its former glory.

Most of the issue with the clock involved drift with the timekeeping circuitry. Since it wasn’t accurately keeping the time anymore, losing around 10 minutes a day, the goal to save it was to use NTP to get the current time and a microcontroller to make the correction automatically. Rather than replace everything in the clock except the display, [zst123] is using the existing circuit board and adding an ESP8266 to grab the time from the Internet. A custom driver board reads the current time displayed on the clock directly from the display itself and then the ESP8266 can adjust it by using the existing buttons through a relay wired in parallel.

Using the existing circuitry was certainly a challenge especially since the display was multiplexed, but the LM8562 that came with these clock radios is a common and well-documented chip for driving displays like this, giving [zst123] a leg up over something unlabeled or proprietary. Using NTP is certainly a reliable and straightforward way of getting the current time too but there are a few other options for projects like these like using GPS or even a radio signal.

Exploring The Sega Saturn’s Wacky Architecture

Sega Saturn mainboard with main components labelled. More RAM is found on the bottom, as well. (Credit: Rodrigo Copetti)
Sega Saturn mainboard with main components labelled. More RAM is found on the bottom, as well. (Credit: Rodrigo Copetti)

In the annals of game console history, the Sega Saturn is probably the most convoluted system of all time, even giving the Playstation 3 a run for its rings. Also known as the system on which Sega beached itself before its Dreamcast swansong, it featured an incredible four CPUs, two video processors, multiple levels and types of RAM, all pushed onto game studios with virtually no software tools or plan how to use the thing. An introduction to this console’s architecture is provided by [Rodrigo Copetti], which gives a good idea of the harrowing task of developing for this system.

Launched in Japan in 1994 and North America and Europe in 1995, it featured a double-speed CD-ROM drive, Hitachi’s zippy new SH-2 CPU (times two) and some 3D processing grunt that was intended to let it compete with Sony’s Playstation. The video and sound solutions were all proprietary to Sega, with the two video processors (VDP1 & 2) handling parts of the rendering process which complicated its use for 3D tasks, along with its use of quadrilaterals instead of triangles as with the Playstation and Nintendo 64.

Although a lot of performance could be extracted from the Saturn’s idiosyncratic architecture, its high price and ultimately the competition with the Sony Playstation and the 1996 release of the Nintendo 64 would spell the end for the Saturn. Although the Dreamcast did not repeat the Saturn’s mistakes, it seems one commercial failure was enough to ruin Sega’s chances as a hardware developer.

Retrogadgets: Butler In A Box

You walk into your house and issue a voice command to bring up the lights and start a cup of coffee. No big deal, right? Siri, Google, and Alexa can do all that. Did we mention it is 1985? And, apparently, you were one of the people who put out about $1,500 for a Mastervoice “Butler in a Box,” the subject of a Popular Science video you can see below.

If you think the box is interesting, the inventor’s story is even stranger. [Kevin] got a mint-condition Butler in a Box from eBay. How did it work, given in 1983, there was no AI voice recognition and public Internet? We did note that the “appliance module” was a standard X10 interface.

Continue reading “Retrogadgets: Butler In A Box”

Homebrew GPU Tackles Quake

Have you ever wondered how a GPU works? Even better, have you ever wanted to make one? [Dylan] certainly did, because he made FuryGPU — a fully custom graphics card capable of playing Quake at over 30 frames per second.

As you might have guessed, FuryGPU isn’t in the same league as modern graphics card — those are made of thousands of cores specialized in math, which are then programmed with whatever shaders you want. FuryGPU is a more “traditional” GPU, it has dedicated hardware for all the functions the GPU needs to perform and doesn’t support “shader code” in the same way an AMD or NVIDIA GPU does. According to [Dylan], the hardest part of the whole thing was writing Windows drivers for it.

On his blog, [Dylan] tells us all about how he went from the obligatory [Ben Eater] breadboard CPU to playing with FPGAs to even larger FPGAs to bear the weight of this mighty GPU. While this project isn’t exactly revolutionary in the GPU world, it certainly is impressive and we impatiently wait to see what comes next.

Continue reading “Homebrew GPU Tackles Quake

Tech Support… Can AI Be Worse?

You can’t read the news today without another pundit excitedly reporting how AI is going to take every job you can imagine. Of course, AI will change the employment landscape. It will take some jobs and reduce the need for others. What about tech support? Is it possible that an AI might be able to help people with technical issues better than humans? My first answer was no way, but then I was painfully reminded of something. The question isn’t if AI can help you better than any human can. The question is if AI can help you better than the low-paid person on the other end of the phone you are likely to talk to. Sadly, I think the answer to that question is almost certainly yes.

In all fairness, if you read Hackaday, you probably don’t encounter many technical support people who can solve a problem you can’t. By the time you call them, it is a lost cause. But this is more than just “Hackday folks are smarter than the tech support agents.” The overall quality of tech support at many companies is rock bottom no matter who you are. Continue reading “Tech Support… Can AI Be Worse?”