Reprogramming Super Mario World from Inside The Game

[SethBling] recently set a world record speed run of the classic Super Nintendo game Super Mario World on the original SNES hardware. He managed to beat the game in five minutes and 59.6 seconds. How is this possible? He actually reprogrammed the game by moving specific objects to very specific places and then executing a glitch. This method of beating the game was originally discovered by Twitch user [Jeffw356] but it was performed on an emulator. [SethBling] was able to prove that this “credits warp” glitch works on the original hardware.

If you watch the video below, you’ll see [SethBling] visit one of the first available levels in the game. He then proceeds to move certain objects in the game to very specific places. What he’s doing here is manipulating the game’s X coordinate table for the sprites. By moving objects to specific places, he’s manipulating a section of the game’s memory to hold specific values and a specific order. It’s a meticulous process that likely took a lot of practice to get right.

Once the table was setup properly, [SethBling] needed a way to get the SNES to execute the X table as CPU instructions. In Super Mario World, there are special items that Mario can obtain that act as a power up. For example, the mushroom will make him grow in size. Each sprite in the game has a flag to tell the SNES that the item is able to act as a power up. Mario can either collect the power up by himself, or he can use his friendly dinosaur Yoshi to eat the power up, which will also apply the item’s effects to Mario.

The next part of the speed run involves something called the item swap glitch. In the game, Mario can collect coins himself, or Yoshi can also collect them by eating them. A glitch exists where Yoshi can start eating a coin, but Mario jumps off of Yoshi and collects the coin himself simultaneously. The result is that the game knows there is something inside of Yoshi’s mouth but it doesn’t know what. So he ends up holding an empty sprite with no properties. The game just knows that it’s whatever sprite is in sprite slot X.

Now comes the actual item swap. There is an enemy in the game called Chargin’ Chuck. This sprite happens to have the flag set as though it’s a power up. Normally this doesn’t matter because it also has a set flag to tell the game that it cannot be eaten by Yoshi. Also, Chuck is an enemy so it actually hurts Mario rather than act as a power up. So under normal circumstances, this sprite will never actually act as a power up. The developers never programmed the game to properly handle this scenario, because it was supposed to be impossible.

If the coin glitch is performed in a specific location within the level, a Chargin’ Chuck will spawn just after the coin is collected. When the Chuck spawns, it will take that empty sprite slot and suddenly the game believes that Yoshi is holding the Chuck in his mouth. This triggers the power up condition, which as we already know was never programmed into the game. The code ends up jumping to an area of memory that doesn’t contain normal game instructions.

The result of all of this manipulation and glitching is that all of the values in the sprite X coordinate table are executed as CPU instructions. [SethBling] setup this table to hold values that tell the game to jump to the end credits. The console executes them and does as commanded, and the game is over just a few minutes after it began. The video below shows the speed run but doesn’t get too far into the technical details, but you can read more about it here.

This isn’t the first time we’ve seen this type of hack. Speed runs have been performed on Pokemon with very similar techniques. Another hacker managed to program and execute a version of single player pong all from within Pokemon Blue. We can’t wait to see what these game hackers come up with next. Continue reading “Reprogramming Super Mario World from Inside The Game”

Reverse Engineering Capcom’s Crypto CPU

There are a few old Capcom arcade titles – Pang, Cadillacs and Dinosaurs, and Block Block – that are unlike anything else ever seen in the world of coin-ops. They’re old, yes, but what makes these titles exceptional is the CPU they run on. The brains in the hardware of these games is a Kabuki, a Z80 CPU that had a few extra security features. why would Capcom produce such a thing? To combat bootleggers that would copy and reproduce arcade games without royalties going to the original publisher. It’s an interesting part of arcade history, but also a problem for curators: this security has killed a number of arcade machines, leading [Eduardo] to reverse engineering and document the Kabuki in full detail.

While the normal Z80 CPU had a pin specifically dedicated to refreshing DRAM, the Kabuki repurposed this pin for the security functions on the chip. With this pin low, the Kabuki was a standard Z80. When the pin was pulled high, it served as a power supply input for the security features. The security – just a few bits saved in memory – was battery backed, and once this battery was disconnected, the chip would fail, killing the game.

Plugging Kabuki into an old Amstrad CPC 6128 without the security pin pulled high allowed [Eduardo] to test all the Z80 instructions, and with that no surprises were found; the Kabuki is fully compatible with every other Z80 on the planet. Determining how Kabuki works with that special security pin pulled high is a more difficult task, but the Mame team has it nailed down.

The security system inside Kabuki works through a series of bitswaps, circular shifts, XORs, each translation different if the byte is an opcode or data. The process of encoding and decoding the security in Kabuki is well understood, but [Eduardo] had a few unanswered questions. What happens after Kabuki lost power and the memory contents – especially the bitswap, address, and XOR keys – vanished? How was the Kabuki programmed in the factory? Is it possible to reprogram these security keys, allowing one Kabuki to play games it wasn’t manufactured for?

[Eduardo] figured being able to encrypt new, valid code was the first step to running code encrypted with different keys. To test this theory, he wrote a simple ‘Hello World’ for the Capcom hardware that worked perfectly under Mame. While the demo worked perfectly under Mame, it didn’t work when burned onto a EPROM and put into real Capcom hardware.

That’s where this story ends, at least for the time being. The new, encrypted code is valid, Mame runs the encrypted code, but until [Eduardo] or someone else can figure out any additional configuration settings inside the Kabuki, this project is dead in the track. [Eduardo] will be back some time next week tearing the Kabuki apart again, trying to unravel the mysteries of what makes this processor work.

Extrinsic Motivation: P = NP If You Have A Time Machine

Not all of the entries to The Hackaday Prize were serious – at least we hope not – and this one is the most entertaining of the bunch. [Eduardo] wants to put a flux capacitor in a CPU pipeline. Read that last sentence again, grab a cup of coffee, mull it over, and come back. This post will still be here.

Assuming the events portrayed in BTTF could be real in some alternate history or universe, consider the properties of a DeLorean time machine: It requires 1.21 Jiggawatts (we’re assuming this is Gigawatts from now on), has a curb weight of about three thousand pounds with the nuclear reactor and/or hovercar conversion, and is able to travel in time ± 30 years. If the power required to travel time were to scale proportionally with mass, sending a CPU register back in time would only require a Watt or so. Yes, ‘ol [Doc Brown] had it wrong with wanting to send a car back in time – sending information back is much, much easier. Now, what do you do with it?

[Eduardo] is using this to speed up pipelined CPUs. In a CPU pipeline, instructions are executed in parallel, but if one instruction depends on the output of another instruction, bad things happen CPU designers have spent long, sleepless nights figuring out how to prevent this. Basically, a MEMS flux capacitor solves all outstanding problems in CPU design. It’s brilliant, crazy, and we’re glad to see it as an entry to The Hackaday Prize.

[Eduardo], though, isn’t seeing the forest for the trees. If you have a flux capacitor in your CPU, why even bother with optimizing a CPU? Just take a normal CPU, add a flux capacitor register, and have the output of a long and complex calculation write to the time traveling register. All calculations then happen instantly, your Ps and NPs are indistinguishable. All algorithms run in O(1), and the entire endeavor is a light-hearted romp for the entire family.

SpaceWrencherThis project is an official entry to The Hackaday Prize that sadly didn’t make the quarterfinal selection. It’s still a great project, and worthy of a Hackaday post on its own.

Behold! The Most Insane Crowdfunding Campaign Ever

Hold on to your hats, because this is a good one. It’s a tale of disregarding the laws of physics, cancelled crowdfunding campaigns, and a menagerie of blogs who take press releases at face value.

Meet Silent Power (Google translation). It’s a remarkably small and fairly powerful miniature gaming computer being put together by a team in Germany. The specs are pretty good for a completely custom computer: an i7 4785T, GTX 760, 8GB of RAM and a 500GB SSD. Not a terrible machine for something that will eventually sell for about $930 USD, but what really puts this project in the limelight is the innovative cooling system and small size. The entire machine is only 16x10x7 cm, accented with a very interesting “copper foam” heat sink on top. Sounds pretty cool, huh? It does, until you start to think about the implementation a bit. Then it’s a descent into madness and a dark pit of despair.

There are a lot of things that are completely wrong with this project, and in true Hackaday fashion, we’re going to tear this one apart, figuring out why this project will never exist.

Continue reading “Behold! The Most Insane Crowdfunding Campaign Ever”

How Do You Build a Relay CPU?


The Hackaday tips line is always full of the coolest completed projects, but only rarely do we see people reaching out for help on their latest build. We’ll help when we can, but [Tim]’s relay-based CPU has us stumped.

[Tim] already has the design of his relay CPU completed with a 12-bit program counter, sequencer, ALU, and a transistor-based ROM. The problem he’s having deals with the mechanics and layout of his homebuilt CPU. Right now, all the relays (PC pin, we guess) are glued top-down to a piece of cardboard. This allows him to easily solder the wires up and change out the inevitable mistakes. This comes with a drawback, though: he’s dealing with a lot of ‘cable salad’ and it’s not exactly the prettiest project ever.

The ideal solution, [Tim] says, would be a PCB with through-hole plating, but this isn’t easy or cheap for the home fab lab. We’d suggest some sort of wire wrap setup, but proper wire wrap sockets and protoboards are for some reason unreasonably expensive.

If you have an idea on how to do the mechanical layout and connections of a relay-based computer, drop a note in the comments. [Tim] has a very cool project here, and it would be a shame if he were to give up on it due to a lack of tools.

Video below, and if you’re having a problem with a project, feel free to send it in.

Continue reading “How Do You Build a Relay CPU?”

The Nibbler: a 4-bit CPU built with 7400 logic


Maybe we shouldn’t say “built” since [Steve Chamberlin] hasn’t actually heated up his iron yet. From the finished schematic above that is puzzling at first, until you realize the scope of the project. His Nibbler implements a 4-bit CPU using 7400 logic chips. Because he’s come up with the architecture himself he’s taking a lot of steps to check all of his work before committing to a PCB.

We linked to his category for the project which is still in progress. Most recently he wrote a program to prove that it’ll run on the hardware. That’s a feat considering this is still just a design idea. It was made possible because he wrote a simulator based on the design. The C++ tool simulates data and control buses and features a full set of debugging tools.

Careful testing of the design before the build is the best possible way to go. The simulator and debugging tools will be useful for software development even after the hardware is built. And testing before wiring is a must as these things get out of control quickly in terms of soldering complexity.

[via Dangerous Prototypes]

The Mill CPU architecture

There are basically two ways to compute data. The first is with a DSP, a chip that performs very specialized functions on a limited set of data. These are very cheap, have amazing performance per watt, but can’t do general computation at all. If you’d like to build a general-purpose computer, you’ll have to go with a superscalar processor – an x86, PowerPC, or any one of the other really beefy CPU architectures out there. Superscalars are great for general purpose computing, but their performance per watt dollar is abysmal in comparison to a DSP.

A lot of people have looked into this problem and have come up with nothing. This may change, though, if [Ivan Godard] of Out-of-the-Box computing is able to produce The Mill – a ground-up rethink of current CPU architectures.

Unlike DSPs, superscalar processors you’d find in your desktop have an enormous amount of registers, and most of these are rename registers, or places where the CPU stores a value temporarily. Combine this with the fact that connecting hundreds of these temporary registers to places where they’ll eventually be used eats up about half the power budget in a CPU, and you’ll see why DSPs are so much more efficient than the x86 sitting in your laptop.

[Ivan]’s solution to this problem is replacing the registers in a CPU with something called a ‘belt’ – basically a weird combination of a stack and a shift register. The CPU can take data from any position on the belt, perform an operation, and places the result at the front of the belt. Any data that isn’t used simply falls off the belt; this isn’t a problem, as most data used in a CPU is used only once.

On paper, it’s a vastly more efficient means of general purpose computation. Unfortunately, [Ivan] doesn’t quite have all the patents in for The Mill, so his talks (two available below) are a little compartmentalized. Still, it’s one of the coolest advances in computer architecture in recent memory and something we’d love to see become a real product.

Continue reading “The Mill CPU architecture”