New Game, Old Ways: Cramming An NES Game Into 40 KB

January 9, 2019

Why would anyone bother to create new content for a console system that’s staring down its 40th birthday? Perhaps just for the challenge of fitting a game into 40 kilobytes of storage.

That at least seems to be the motivation behind [Morphcat Games] pending release of Micro Mages, a new game for the Nintendo Entertainment System console that takes its inspiration from Super Mario Bros. The interesting bit here is how they managed to stuff so much content into so little space. The video below goes into great detail on that, and it’s a fascinating lesson in optimization. The game logic itself is coded in assembler, which of course is far more efficient than higher level languages. Even so, that took 32 kB of ROM, leaving a mere 8 kB for background elements and foreground sprites.

Through a combination of limited sprite size, tiling of smaller sprites to make larger characters, and reusing tiles by flipping them horizontally or vertically, an impressively complete palette of animated characters was developed. Background elements were similarly deconstructed and reused, resulting in a palette of tiles used to generate all the maps for the game that takes up just 60 bytes. Turning those into playable levels involves more mirroring and some horizontal shifting of tiles, and it looks like quite an engaging playfield.

Yes, there’s a Kickstarter for the game, but we’re mainly intrigued by what it takes to cram a playable game into so little space. Don’t get us wrong – we love the Retro Pie builds too, but seeing the tricks that early game developers relied upon to make things work really gets the creative juices flowing.

[Keith O] dug up this gem for us. Thanks!

27 thoughts on “New Game, Old Ways: Cramming An NES Game Into 40 KB”

Jan says:

January 9, 2019 at 5:01 am

“The game logic itself is coded in assembler, which of course is far more efficient than higher level languages.”

hmm… a good compiler in combination with the correct settings and a programmer who knows what he/she is doing is very likely to output the same code these days. You could argue that it would require some extra effort, but coding in assembly takes effort too. But honestly, I prefer assembly sometimes, just because it just so much fun and because it can be fun to write a super optimized code with all sorts of trickery just to shave of one CPU cycle from the loop. But those pieces of code are exceptions.

Report comment

Reply
1. daid303 says:
  
  January 9, 2019 at 5:12 am
  
  Depends on what you are doing. When doing C/C++, there is a lot of “rules” that the compiler needs to ensure. Even if you are not depended on it, while in assembly, you have more knowledge on what details are important and not.
  
  Just a random example, what if you need 24bit math on a 8bit CPU. C/C++ you need to do 32bit math. Assembly you can actually do 24bit math.
  
  Report comment
  
  Reply
  1. pelrun says:
    
    January 9, 2019 at 5:31 am
    
    C doesn’t have a 24-bit wide primitive type, but then again *neither does assembly*. So there’s nothing stopping you performing 24-bit math in C the same way you did it in assembly, by combining operations on multiple smaller values.
    
    Report comment
    
    Reply
    1. Wolf says:
      
      January 9, 2019 at 7:44 am
      
      In Microchip C, a short long is 24 bits.
      
      Report comment
      
      Reply
    2. Bill says:
      
      January 9, 2019 at 10:03 am
      
      How do you access the carry flag in C? Add with carry (or subtract with borrow) are key to doing math on values larger than the native word size of the CPU.
      
      Report comment
      
      Reply
      1. lamalas says:
        
        January 9, 2019 at 10:30 am
        
        There are macros for such things.
        Also if you need to access the carry flag in C then you are using a wrong concept and need to learn designing proper architecture/algorithms.
        
        Report comment
      2. Pat says:
        
        January 9, 2019 at 4:58 pm
        
        No, there are not. There’s no portable way to access the carry flag in C, and many (most?) compilers have no way to do so.
        
        And your comment about “doing it wrong” is seriously insane. The carry bit is an architectural feature. There are tons of ways to use it, and not all are covered efficiently by the compiler.
        
        Example: you have 2 arrays of 8 bytes, and you want to store which bytes are bigger in each into a bit array for later use. Most architectures set carry on a compare, and have a rotate which pulls in carry. So you should be able to just compare and rotate, but C won’t let you do that and compilers aren’t usually smart enough.
        
        Report comment
      3. lamalas says:
        
        January 12, 2019 at 1:39 pm
        
        “The carry bit is an architectural feature.”
        “no portable way to access the carry flag in C”
        There is your contradiction.
        
        Report comment
  2. dru says:
    
    January 9, 2019 at 6:36 am
    
    That example depends heavily on the compiler and architecture in question. The C language has been ported to numerous computer architectures, many of which don’t have the standard 8/16/32/64bit integer sizes commonly used nowadays. Meaning that the bit lengths of the char/short/int/long types aren’t actually defined by the standard. (float and double are defined to be (at-least) 32 and 64bits respectively).
    
    What if the computer you’re programming for has 12bit chars and 18bit ints? The language itself supports that, though porting code to such a machine will be more difficult natch.
    
    In the case of the 6502, as for the NES, I doubt that common C compilers use 32bit math by default.
    
    Report comment
    
    Reply
2. jero32 says:
  
  January 9, 2019 at 5:46 am
  
  While that may be true for modern cpus. The 6502 has pretty mediocre options for C. In fact the architecture being memory and not register based doesn’t seem to be very well suited for C. So unless someone makes a much more efficient C compiler for 6502, it’s better to code in assembler.
  
  Report comment
  
  Reply
  1. Truth says:
    
    January 9, 2019 at 2:53 pm
    
    You could use BCPL that was available on the BBC micro, which had a 6502 processor.
    
    Report comment
    
    Reply
3. some guy says:
  
  January 9, 2019 at 5:51 am
  
  Real geeks use assembly (or butterflies sometimes).
  
  Report comment
  
  Reply
  1. ???? says:
    
    January 9, 2019 at 10:20 am
    
    Or rollercoaster game programming.
    
    Report comment
    
    Reply
4. Stefan_Z says:
  
  January 9, 2019 at 5:59 am
  
  Sure, ASM is fun and games, but try to implement a whole WiFI stack, TCP stack, a simple HTTP server, UART routines, 16bit TFT and touch libraries in Extensa ASM first! Then debug it :-)
  Squeezing 30 kBytes of ASM into these tiny cartridges is the way to go. Filling the 4Mbytes of an ESPs Flash using ASM, is pretty much a lifetime task…
  
  Report comment
  
  Reply
  1. nbunnell776 says:
    
    January 9, 2019 at 7:10 am
    
    I feel like there is a story here… Details?
    
    Report comment
    
    Reply
  2. Jan says:
    
    January 9, 2019 at 7:55 am
    
    Exactly! I could not agree more.
    In other words, use the right tool for the job. And if there is any need to do some very specific part in assembly then the C-compiler should allows you to do so within the C-code itself by using a special directive so that the compiler knows that that parts needs to treated as assembly code.
    
    Report comment
    
    Reply
5. Ralph Doncaster (Nerd Ralph) says:
  
  January 9, 2019 at 10:17 am
  
  I beg to differ. On embedded platforms like AVR and ARM, GCC is pretty bad. On anything other than trivial code, I can usually cut the code size down by 20-30%.
  For x86 it’s a bit better, and it’s only getting close since LTO was introduced. I still don’t think you’ll see the day when you can write a 64K demo in C that competes with the hand-optimized asm.
  
  For small embedded work, I generally prefer asm. For bigger programs (over 1k lines), I’ll use more C/C++, but if I had a high-level assembler that took care of register allocation, I’d use it for even larger programs.
  
  Report comment
  
  Reply
6. ???? says:
  
  January 9, 2019 at 10:28 am
  
  Could you even program something like this outside of assembly, even with modern software development platforms?
  
  They had to take so many shortcuts and “compress” almost everything to get it to both fit but also to be functional. Not sure you can get this to work otherwise without that.
  
  It probably helps that the scale and scope of the entire project is digestible due to it being so small. Relatively speaking of course. Still took them several years though.
  
  Report comment
  
  Reply
7. krater says:
  
  January 9, 2019 at 1:56 pm
  
  On x86 you are completely right. But this is 6502, do you know how good the optimizers are on this CPU?
  
  Report comment
  
  Reply
Pat says:

January 9, 2019 at 5:50 am

I’ll give better examples then: in C, there’s no way to get the state of the zero and carry flags, so you can’t do tricks that the architecture supports, like comparing and then shifting the result of the compare into another register using the rotate instructions.

But register allocation is the bigger deal: compilers stick to rigid register calling conventions to allow for portability, and that’s where you can save a lot of time/space between the two languages.

Report comment

Reply
1. Fred says:
  
  January 9, 2019 at 6:35 pm
  
  Spot on. A lot of CPUs have an ‘alternate’ register set, which compilers use to handle interrupts and aren’t used during normal program flow. But when writing assembler, I can use some of those alternate registers because I know my interrupt routines don’t use them.
  
  Report comment
  
  Reply
MK 2 says:

January 9, 2019 at 6:34 am

I’ve recreated a couple of ZX Spectrum classics and learned a lot about optimizing the limited memory. The 16Kb (!) game Tranz-Am for instance plays on a map of 6192 by 3112 pixels ie almost 600Kb. I’ve written a program that reduces that huge tilemap to an array that takes up less than 2048 bytes.

Report comment

Reply
1. Jii says:
  
  January 9, 2019 at 11:00 am
  
  That’s a nice reduction.
  
  Report comment
  
  Reply
fields of gold says:

January 9, 2019 at 6:40 am

But the original Excitebike was less than 32k

Report comment

Reply
1. Padrote says:
  
  January 9, 2019 at 3:18 pm
  
  And Galaxian is 8k. This game likely has a lot more packed into it than those two, though.
  
  Report comment
  
  Reply
NESHomebrew says:

January 9, 2019 at 10:29 am

Since the kickstarter is finished, if you want to buy the game it is available on indiegogo indemand: https://www.indiegogo.com/projects/micro-mages-a-new-game-for-the-nes#/

Report comment

Reply
Sykobee says:

January 11, 2019 at 2:27 am

6502 assembler is very well known, and there are plenty of people out there who simply breathe it, what with the C64 demo scene. For a NES game it is simply the best option. You could probably get away with C for higher level game logic of course.

256 4-colour 8×8 bitmaps takes 4KB of memory, the big memory issue is level layouts. 60 bytes per game screen (as I interpret this article) is not too bad at all considering the NES has 960 tiles to display (I think the game uses 2×2 tiles for map elements, but that’s still ~240 bytes per screen (I guess there’s a HUD taking up some of that space)). From the Indiegogo page I can see a screenshot that appears ‘mirrored’ so that halves the data again, and maybe there’s some form of compression in play too (RLE?). Or maybe there are ‘supertiles’ made up of these 2×2 blocks.

Secondly, I’m still sick of videos. What ever happened to readable articles with screenshots I can read in a couple of minutes?

Report comment

Reply

Hackaday

New Game, Old Ways: Cramming An NES Game Into 40 KB

27 thoughts on “New Game, Old Ways: Cramming An NES Game Into 40 KB”

Leave a Reply to Stefan_ZCancel reply

Search

Never miss a hack

If you missed it

The Rise And Fall Of The In-Car Fax Machines

How Advanced Autopilots Make Airplanes Safer When Humans Go AWOL

2025: As The Hardware World Turns

Why Can’t I 3D Print With Rubber?

How Wind Nearly Took Down Boulder NTP

Our Columns

How Do PAL And NTSC Really Work?

Linux Fu: Yet Another Shell Script Trick

Hands On WIth The Raspberry Pi Compute Module Zero

Hackaday Podcast Ep 351: Hackaday Goes To Chaos Communication Congress

Linux Fu: Compose Yourself!

27 thoughts on “New Game, Old Ways: Cramming An NES Game Into 40 KB”

Leave a Reply to Stefan_ZCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns