Even with ten fingers to work with, math can be hard. Microprocessors, with the silicon equivalent of just two fingers, can have an even harder time with calculations, often taking multiple machine cycles to figure out something as simple as pi. And so 40 years ago, Intel decided to give its fledgling microprocessors a break by introducing the 8087 floating-point coprocessor.
If you’ve ever wondered what was going on inside the 8087, wonder no more. [Ken Shirriff] has decapped an 8087 to reveal its inner structure, which turns out to be closely related to its function. After a quick tour of the general layout of the die, including locating the microcode engine and ROM, and a quick review of the NMOS architecture of the four-decade-old technology, [Ken] dug into the meat of the coprocessor and the reason it could speed up certain floating-point calculations by up to 100-fold. A generous portion of the complex die is devoted to a ROM that does nothing but store constants needed for its calculation algorithms. By carefully examining the pattern of NMOS transistors in the ROM area and making some educated guesses, he was able to see the binary representation of constants such as pi and the square root of two. There’s also an extensive series of arctangent and log2 constants, used for the CORDIC algorithm, which reduces otherwise complex transcendental calculations to a few quick and easy bitwise shifts and adds.
[Ken] has popped the hood on a lot of chips before, finding butterflies in an op-amp and reverse-engineering a Sinclair scientific calculator. But there’s something about seeing constants hard-coded in silicon that really fascinates us.
The Super Nintendo port of Gradius III is notable for being close to the arcade original, with its large, bright and colorful graphics. However, due to the limitation of the console’s hardware, the port is also well known for having constant slowdowns during gameplay, particularly during later sections. [Vitor] hacked away at the game and made a patched version of the ROM use a co-processor to eliminate those issues.
The slowdown seen here in Gradius is not uncommon to SNES players, many games of that era suffer from it when several sprites appear on the screen at once. This is partially due to the aging CPU Nintendo chose, supposedly in order to maintain NES backwards compatibility before the idea got scrapped. Unable to complete its tasks by the time the next frame needs to be shown, the hardware skips frames to let the processor catch up before it can continue. This is perceived as the aforementioned slowdown.
Around the later stage of the SNES’s life, games started using additional chips inside the cartridges in order to enhance the console’s performance. One of them is the SA1, which is a co-processor with the same core as the main CPU, only with a higher clock rate. By using it, games had more time to run through the logic and graphics manipulation before the next frame. What [Vitor] did was port those parts of Gradius III to the SA1, essentially making it just like any other enhanced cartridge from back in the day.
Unlike previous efforts we’ve seen to overclock the SNES by giving it a longer blanking time, this method works perfectly on real unmodified hardware. You can see the results of his efforts after the break, particularly around stage 2 where several bubbles fill the screen on the second video.
Continue reading “Adding A Co-Processor To Help SNES Games With Slowdown”
When you have a complex task that would sap the time and energy of your microprocessor, it makes sense to offload it to another piece of hardware. We are all used to this in the form of the graphics chipsets our computers use — specialised processors whose computing power in that specific task easily outshines that of our main CPU. This offloading of tasks is just as relevant at the microcontroller level too. One example is the EM Microelectronics EM7180 motion co-processor. It takes input from a 3-axis gyroscope/accelerometer and magnetometer, acting for all intents and purposes as a fit-and-forget component. Given an EM7810, your host can determine its heading and speed at a simple command, with no need for any hard work.
[Kris Winer] used the EM7810, but frustrated at its shortcomings decided to create a more versatile alternative. The result is a small PCB holding a Maxim MAX32660 ARM Cortex M4F microcontroller and the relevant sensors, with the MAX32660’s increased power and integrated flash easily eclipsing the EM7810.
As a design exercise it’s an interesting read even if you have no need for one. His write-up goes into detail on the state of the motion coprocessor art, and then looks carefully at pushing the limits of what is possible using an inexpensive PCB fabrication house such as OSH Park — you can get this chip as a Wafer-Level Package (WLP) which is definitely off-limits. Even with the TQFN-24 he picked though, the result is a tiny board and we’re happy to see it as an entry in the Return of the Square Inch Project!
It is perhaps surprising how few projects like this one make it into our sphere, as a community we tend to focus upon making one processor do all the hard work. But with the ready availability of inexpensive and powerful devices, perhaps this is an approach that we should reconsider.
The decryption key for Apple’s Secure Enclave Processor (SEP) firmware Posted Online by self-described “ARM64 pornstar” [xerub]. SEP is the security co-processor introduced with the iPhone 5s which is when touch ID was introduced. It’s a black box that we’re not supposed to know anything about but [xerub] has now pulled back the curtain on that.
The secure enclave handles the processing of fingerprint data from the touch ID sensor and determines if it is a match or not while it also enables access for purchases for the user. The SEP is a gatekeeper which prevents the main processor from accessing sensitive data. The processor sends data which can only be read by the SEP which is authenticated by a session key generated from the devices shared key. It also runs on its own OS [SEPOS] which has a kernel, services drivers and apps. The SEP performs secure services for the rest of the SOC and much more which you can learn about from the Demystifying the Secure Enclave Processor talk at Blackhat
[xerub] published the decryption keys here. To decrypt the firmware you can use img4lib and xerub’s SEP firmware split tool to process. These tools make it a piece of cake for security researchers to comb through the firmware looking for vulnerabilities.
[Adam Laurie] spent time tearing into the security of the SAM7XC chip produced by Atmel. Even if he hadn’t found some glaring security holes just reading about his methodology is worth it.
The chip is used in a secure RFID system. The chip is added to the mix to do the heavy lifting required when using encryption. [Adam] grabbed a couple of open source libraries to put it to the test. The firmware is locked down pretty tight, but his explorations into the content of the RAM yield a treasure trove of bits. After investigating the sample code for the chip he’s shocked to learn that it uses RAM to store the keys at one point. The rest of his journey has him dumping the data and sifting through it until he gets to the “Master Diversification Key”. That’s the big daddy which will let him decrypt any of the tags used.
He reported his findings to Atmel in September of 2011. Their response is that they have no way of protecting RAM from exploit. [Adam] asserts that the problem is that the sample software wasn’t designed with the vulnerability of RAM in mind. The keys should never be stored there specifically because it is vulnerable to being dumped from a running system.
[Alessandro] sent us a link to his post about a PRU software VGA rasterizer. It’s not the easiest read, but we think it’s worth your time.
The gist of his background information is that back when his company was developing for an ARM9 processor he wanted to test his mettle with the coprocessor chips. The first iteration was to write a character LCD driver that pulled data from the main processor’s memory and displayed it on the screen. This makes for a low-overhead debugger display, it’s also very limited (32 characters over two lines doesn’t tell you much). And thus began his work on a VGA generator for the Programmable Realtime Unit (PRU is what TI calls this coprocessor) that grabs data in memory just like the original version. But with a much larger display area this becomes quite useful for debugging. That resistor mess is the R2R ladder he soldered together to perform the Digital to Analog Conversions. There’s a quick demo clip after the jump.
This work could end up being useful to you. [Alessandro] reports that the BeagleBone has similar hardware. A bit of porting could get his generator working on that board as well.
Continue reading “Offloading VGA Generation Onto A Coprocessor”