How The Intel 8087 FPU Knows Which Instructions To Execute

An interesting detail about the Intel 8087 floating point processor (FPU) is that it’s a co-processor that shares a bus with the 8086 or 8088 CPU and system memory, which means that somehow both the CPU and FPU need to know which instructions are intended for the FPU. Key to this are eight so-called ESCAPE opcodes that are assigned to the co-processor, as explained in a recent article by [Ken Shirriff].

The 8087 thus waits to see whether it sees these opcodes, but since it doesn’t have access to the CPU’s registers, sharing data has to occur via system memory. The address for this is calculated by the CPU and read from by the CPU, with this address registered by the FPU and stores for later use in its BIU register. From there the instruction can be fully decoded and executed.

This decoding is mostly done by the microcode engine, with conditional instructions like cos featuring circuitry that sprawls all over the IC. Explained in the article is how the microcode engine even knows how to begin this decoding process, considering the complexity of these instructions. The biggest limitation at the time was that even a 2 kB ROM was already quite large, which resulted in the 8087 using only 22 microcode entry points, using a combination of logic gates and PLAs to fully implement the entire ROM.

Only some instructions are directly implemented in hardware at the bus interface (BIU), which means that a lot depends on this microcode engine and the ROM for things to work half-way efficiently. This need to solve problems like e.g. fetching constants resulted in a similarly complex-but-transistor-saving approach for such cases.

Even if the 8087 architecture is convoluted and the ISA not well-regarded today, you absolutely have to respect the sheer engineering skills and out-of-the-box thinking of the 8087 project’s engineers.

Retrotechtacular: Bleeding-Edge Memory Devices Of 1959

Although digital computers are – much like their human computer counterparts – about performing calculations, another crucial element is that of memory. After all, you need to fetch values from somewhere and store them afterwards. Sometimes values need to be stored for long periods of time, making memory one of the most important elements, yet also one of the most difficult ones. Back in the 1950s the storage options were especially limited, with a 1959 Bell Labs film reel that [Connections Museum] digitized running through the bleeding edge of 1950s storage technology.

After running through the basics of binary representation and the difference between sequential and random access methods, we’re first taking a look at punch cards, which can be read at a blistering 200 cards/minute, before moving onto punched tape, which comes in a variety of shapes to fit different applications.

Electromechanical storage in the form of relays are popular in e.g. telephone exchanges, as they’re very fast. These use two-out-of-five code to represent the phone numbers and corresponding five relay packs, allowing the crossbar switch to be properly configured.

Continue reading “Retrotechtacular: Bleeding-Edge Memory Devices Of 1959”

Porting Super Mario 64 To The Original Nintendo DS

Considering that the Nintendo DS already has its own remake of Super Mario 64, one might be tempted to think that porting the original Nintendo 64 version would be a snap. Why you’d want to do this is left as an exercise to the reader, but whether due to nostalgia or out of sheer spite, the question of how easy this would be remains. Correspondingly, [Tobi] figured that he’d give it a shake, with interesting results.

Of note is that someone else already ported SM64 to the DSi, which is a later version of the DS with more processing power, more RAM and other changes. The reason why the 16 MB of RAM of the DSi is required, is because it needs to load the entire game into RAM, rather than do on-demand reads from the cartridge. This is why the N64 made do with just 4 MB of RAM, which is as much RAM as the NDS has. Ergo it can be made to work.

The key here is NitroFS, which allows you to implement a similar kind of segmented loading as the N64 uses. Using this the [Hydr8gon] DSi port could be taken as the basis and crammed into NitroFS, enabling the game to mostly run smoothly on the original DS.

There are still some ongoing issues before the project will be released, mostly related to sound support and general stability. If you have a flash cartridge for the DS this means that soon you too should be able to play the original SM64 on real hardware as though it’s a quaint portable N64.

Continue reading “Porting Super Mario 64 To The Original Nintendo DS”

NextSilicon’s Maverick-2: The Future Of High-Performance Computing?

A few months back, Sandia National Laboratories announced they had acquired a new supercomputer. It wasn’t the biggest, but it still offered in their eyes something unique. This particular supercomputer contains NextSilicon’s much-hyped Maverick-2 ‘dataflow accelerator’ chips. Targeting the high-performance computing (HPC) market, these chips are claimed to hold a 10x advantage over the best GPU designs.

NextSilicon Maverick-2 OAM-2 module. (Credit: NextSilicon)
NextSilicon Maverick-2 OAM-2 module. (Credit: NextSilicon)

The strategy here appears to be somewhat of a mixture between VLIW, FPGAs and Sony’s Cell architecture, with a dedicated compiler that determines the best mapping of a particular calculation across the compute elements inside the chip. Naturally, the exact details about the internals are a closely held secret by NextSilicon and its partners (like Sandia), so we basically have only the public claims and PR material to go by.

Last year The Register covered this architecture along with a more in-depth look. What we can surmise from this is that it should perform pretty well for just about all applications, except for single-threaded performance. Of course, as a dedicated processor it cannot do CPU things, which is where NextSilicon’s less spectacular RISC-V-based CPU comes into the picture.

What’s apparent from glancing at the product renders on the NextSilicon site is that these Maverick-2 chips have absolutely massive dies, so they’re absolutely not cheap to manufacture. Whether they’ll make more of a splash than Intel’s Itanium or NVIDIA’s brute force remains to be seen.

Microsoft Uses Plagiarized AI Slop Flowchart To Explain How Git Works

It’s becoming somewhat of a theme that machine-generated content – whether it’s code, text or graphics – keeps pushing people to their limits, mostly by how such ‘AI slop’ is generally of outrageously poor quality, but as in the case of [Vincent Driessen] there’s also a clear copyright infringement angle involved. Recently he found that Microsoft had bastardized a Git explainer graphic which he had in 2010 painstakingly made by hand, with someone at Microsoft slapping it on a Microsoft Learn explainer article pertaining to GitHub.

As noted in a PC Gamer article on this clear faux pas, Microsoft has since quietly removed the graphic and replaced it with something possibly less AI slop, but with zero comment, and so far no response to a request for comment by PC Gamer. Of course, The Internet Archive always remembers.

What’s probably most vexing is that the ripped-off diagram isn’t even particularly good, as it has all the hallmarks of AI slop graphics: from the nonsensical arrows that got added or modified, to heavily mutilated text including changing ‘Time’ to ‘Tim’ and ‘continuously merged’ into ‘continvuocly morged’. This makes it obvious that whoever put the graphic on the Microsoft Learn page either didn’t bother to check, or that no human was involved in generating said page.

Continue reading “Microsoft Uses Plagiarized AI Slop Flowchart To Explain How Git Works”

Poking At The ESP32-P4 And -C6 Dies In An ESP32-P4-M3 Module

The RF section of the ESP32-C6 die. (Credit: electronupdate, YouTube)
The RF section of the ESP32-C6 die. (Credit: electronupdate, YouTube)

With the ESP32-P4 not having any wireless functionality and instead focusing on being a small SoC, it makes sense to combine it with a second chip that handles features like WiFi and Bluetooth. This makes the Guition ESP32-P4-M3 module both a pretty good example of how the P4 will be used, and an excellent opportunity to tear into, decap and shoot photos of the dies of both the P4 and the ESP32-C6 in this particular module, courtesy of [electronupdate]. There also the blog post for those who just want to ogle the shinies.

After popping the metal shield on the module, you can see the contents as in the above photo. The P4 inside is a variant with 32 MB of PSRAM integrated along with the SoC die. This results in a die shot both of this PSRAM and the P4 die, though enough of the top metal seems to remain to clearly see the latter.

The Boya brand Flash chip is quite standard inside, and along with a glance at the inside of one of the crystal oscillators we get to glance at the inside of the C6 MCU. This is a much more simple chip than the P4, with the RF section quite obvious. The total die sizes are 2.7 x 2.7 mm for the C6 and 4.29 x 3.66 mm for the P4.

Continue reading “Poking At The ESP32-P4 And -C6 Dies In An ESP32-P4-M3 Module”

Fixing A Destroyed XBox 360 Development Kit

As common as the Xbox 360 was, the development kits (XDKs) for these consoles are significantly less so. This makes it even more tragic when someone performs a botched surgery on one of these rare machines, leaving it in dire straits. Fortunately [Josh Davidson] was able to repair the XDK in question for a customer, although it entailed replacing the GPU, CPU and fixing many traces.

The Xbox 360 Development Kit is effectively a special version of the consumer console — with extra RAM and features that make debugging software on the unit much easier, such as through direct access to RAM contents. They come in a variety of hardware specifications that developed along with the game console during its lifecycle, with this particular XDK getting an upgrade to being a Super Devkit with fewer hardware restrictions.

Replacing the dead GPU was a new old stock Kronos 1 chip. Fortunately the pads were fine underneath the old GPU, making it easy to replace. After that various ripped-off pads and traces were discovered underneath the PCB, all of which had to be painstakingly repaired. Following this the CPU had apparently suffered heat damage and was replaced with a better CPU, putting this XDK back into service.

Continue reading “Fixing A Destroyed XBox 360 Development Kit”