Teaching A Computer To Play Mario… Seemingly Through Voodoo

April 14, 2013

Some people know [Tom Murphy] as [Dr. Tom Murphy VII Ph.D.] and this hack makes it obvious that he earned those accolades. He decided to see if he could teach a computer to win at Super Mario Bros. But he went about it in a way that we’d bet is different that 99.9% of readers would first think of. The game doesn’t care about Mario, power-ups, or really even about enemies. It’s simply looking at the metrics which indicate you’re doing well at the game, namely score and world/level.

The link above includes his whitepaper, but we think you’ll want to watch the 16-minute video (after the break) before trying to tackle that. In the clip he explains the process in laymen’s terms which so far is the only part we really understand (hence the reference to voodoo in the title). His program uses heuristics to assemble a set of evolving controller inputs to drive the scores ever higher. In other words, instead of following in the footstep of Minesweeper solvers or Bejeweled Blitz bots which play as a human would by observing the game space, his software plays the game over and over, learning what combinations of controller inputs result in success and which do not. The image to the right is a graph of it’s learning progress. Makes total sense, huh?

[via Reddit]

56 thoughts on “Teaching A Computer To Play Mario… Seemingly Through Voodoo”

dreamer says:

April 14, 2013 at 6:18 am

Either you use Dr. or you use Ph.D.
You can’t use both at the same time.

NIce research otherwise, wonder how these kind of algorithms can/will be used in the future.

Report comment

Reply
1. Anonymous says:
  
  April 14, 2013 at 7:02 am
  
  Depends on the country. In some Dr PhD is acceptable. In Germany you can even be a Dr. Dr. Dreamer
  
  Report comment
  
  Reply
  1. dreamer says:
    
    April 15, 2013 at 2:20 am
    
    Yes, additional honorary Doctorates are allowed, but in that case you could also do Anonymous PhD PhD.
    You can’t do both.
    
    Report comment
    
    Reply
2. Aikon says:
  
  April 14, 2013 at 7:20 am
  
  Not if you are also an M.D. =P
  
  Report comment
  
  Reply
3. t-bone says:
  
  April 15, 2013 at 8:39 am
  
  Umm, heard of humor? The “Tom Murphy VII” should have given it away.
  
  Re: Dr. Dr.: Yep, and in Germany you would be addressed as Herr Doktor Doktor Dreamer.
  
  Report comment
  
  Reply
Jim Gray (@grayj_) says:

April 14, 2013 at 6:19 am

This is a lot like other search problems in AI where we can’t exhaustively search the state tree, except the move results are handled by feeding the move to a ROM emulator. Cool to see it applied to emulated video games.

Tricky bits are the choice of ordering / scoring heuristic, and the fact that he’s using emulator state saving to let it compute several potential moves from each move state without having to repeatedly traverse all the states leading up to it.

Report comment

Reply
Cronix says:

April 14, 2013 at 6:20 am

this is sick lol

Report comment

Reply
Phroon says:

April 14, 2013 at 6:28 am

Cleaver and the paper has some funny bits”:
“On a scale of OMFG to WTFLOL I was like whaaaaaaat?”

[Picture of computer playing Tetris, about to lose.]
“Figure 16: Would you hire this construction company? Death is imminent, so playfun pauses the game shortly after this and then doesn’t unpause it.”

Report comment

Reply
1. Kemp says:
  
  April 14, 2013 at 2:52 pm
  
  The only way to win is not to play.
  
  Report comment
  
  Reply
  1. Mike says:
    
    April 14, 2013 at 7:05 pm
    
    lmfao!
    
    Report comment
    
    Reply
  2. DracoBengali says:
    
    April 22, 2013 at 7:01 am
    
    Hello Joshua.
    
    Report comment
    
    Reply
Geebles says:

April 14, 2013 at 6:30 am

Found that really interesting :)

Report comment

Reply
Daniel says:

April 14, 2013 at 6:50 am

Interesting! His approach of looking at the overall size of the memory every 1/6th of a second reminds me of how physicists use energy methods to make complex problems relatively simple. I find it interesting that he only programmed it for one “play gradient”, and that it doesn’t set a gradient to play by from the example game it’s given as the first input.

Report comment

Reply
Cmh62 says:

April 14, 2013 at 7:08 am

The “Dr. vs PhD” comment is best left for the author of the paper instead of Hackaday as this is how “Tom” referred to himself … check the link for the 22 page paper.

Report comment

Reply
Oguz286 says:

April 14, 2013 at 7:09 am

Now THAT’s how you write a paper! Everyone… take notes!

Report comment

Reply
1. spike says:
  
  April 14, 2013 at 8:43 am
  
  It’s REAL paper for a FAKE conference, hence all the jokes.
  
  Report comment
  
  Reply
FrankenPC says:

April 14, 2013 at 7:18 am

Fascinating. Brings up the possibility of using VR with goal oriented machine learning. I’m starting to see that AI ultimately is not going to be a single program calling the shots. It’s going to be one of these programs in greedy mode, one in cautious mode. A danger interrupt program that does care about the aesthetics. Perhaps a cloud oriented language interpreter, etc. It will be more like a hive mind.

Report comment

Reply
1. Tane says:
  
  April 17, 2013 at 5:54 am
  
  Marvin Minksy would agree with you: http://en.wikipedia.org/wiki/The_Society_of_Mind
  
  Report comment
  
  Reply
BayWatch says:

April 14, 2013 at 7:35 am

OK, so from the perspective of a computer scientist, this was awesome to watch / read :) I congratulate you for the great method and the way you present it in. The fact that this works so often by using just analyzing RAM image sequences is remarkable. Imagine what would happen if you use some heuristics to basically validate your input before executing it. This will surely get unbeatable. A very rewarding experience, thanks to both Tom and HaD!

Report comment

Reply
Jayduey says:

April 14, 2013 at 7:38 am

Anyone else thinking of ford prefect’s security robot buddy?
I’m surprised noone has tried that, though this is basically the same thing.

Report comment

Reply
1. DainBramage1991 says:
  
  April 14, 2013 at 8:02 pm
  
  Awesome Hitchhiker’s reference!
  
  And I agree.
  
  Report comment
  
  Reply
David says:

April 14, 2013 at 7:50 am

It would be fascinating to see similar mathematics applied to 2 player games.
Also I think your program plays better than I do.

Report comment

Reply
1. Hirudinea says:
  
  April 14, 2013 at 1:35 pm
  
  It would be interesting to have two systems play each other at a relatively simple game (checkers?) and let them go at each other for a couple of months, at the end they should both be world class players.
  
  Report comment
  
  Reply
  1. Nova says:
    
    April 15, 2013 at 2:21 am
    
    If theyre basically able to view raw ram and make decisions based off that in a fraction of a second, its basically a learning aim-bot of sorts.
    
    Report comment
    
    Reply
FourthDr says:

April 14, 2013 at 8:01 am

Yes, Professor Falcon! Shall we play a game?………Strange game, the only winning move is not to play……LOL :-D

Report comment

Reply
S says:

April 14, 2013 at 8:05 am

The interesing hack here is that he is scrapping the emulator ram contents of the game to grab the data to generate an heuristic where all the hard work is done, the greedy search used even if somehow works is not very good though

The CS188.1x course offered by BerkeleyX thru edX that is just ending now (seems that will be offered again soon along the second part) teach AI learning algorithms using a Pacman implementation in Python with much much more advanced AI methods like Q-Learning.

The course is really awesome and easy to follow btw, not obscure at all as the recomended book from S.Russell and P.Norvig Artificial Intelligence-A Modern Approach.

Report comment

Reply
Arjan says:

April 14, 2013 at 8:14 am

Gotta love that reference to WarGames! :)

Report comment

Reply
1. Scott Tuttle says:
  
  April 14, 2013 at 9:04 am
  
  That was great. The computer did learn.
  
  Report comment
  
  Reply
2. misterstig says:
  
  April 14, 2013 at 6:09 pm
  
  HOW ABOUT A NICE GAME OF CHESS?
  
  Report comment
  
  Reply
t&p says:

April 14, 2013 at 8:19 am

Keeping tetris paused and refusing to not loose points and get a game over is the best part. It’s like a kid that refuses defeat.

Report comment

Reply
Stem says:

April 14, 2013 at 9:11 am

Cool bit of work for sure, but the way the guy addresses the audience really annoys me, could he not just cut the crap and get to the point rather than spending 6 minutes performing a routine?

Report comment

Reply
sevs says:

April 14, 2013 at 10:06 am

Tom is a funny guy!

Report comment

Reply
vonskippy says:

April 14, 2013 at 10:34 am

“I just like this graphic – it doesn’t matter what it means”.

All too frequent in modern papers. It’s called the Desktop Publishing effect. Back in the day (when people walked to school in the snow uphill both ways) graphics took real effort to make, and therefore were used sparingly in papers. Now that it’s just a few clicks from DB or SS to Graph, they’re everywhere.

Hard to say if that’s a good thing or not. It’s a picture is worth a thousand words versus hiding poor results in a pretty picture.

Report comment

Reply
Alex Rossie says:

April 14, 2013 at 11:08 am

Relatively simple topic, awful explanation.

Report comment

Reply
1. Alex Rossie says:
  
  April 14, 2013 at 11:43 am
  
  Explanation:
  He played super mario bros and took 6 RAM snapshots per second and captured his key presses.
  A program analysed the RAM to see which memory locations go up (and considered that making them go up is doing well).
  Had a computer emulate the game along with key presses. Simulating 10-frames ahead for each move considered then picked the best input.
  
  Is what I understood from the whitepaper.
  
  And this takes an hour to generate 16 seconds of retarded mario playtime.
  
  Really uninteresting….
  
  Report comment
  
  Reply
  1. Mr Fish Master says:
    
    April 14, 2013 at 3:04 pm
    
    Wow! life must look so boring through your eyes. You ignore so much with your explanation.
    
    Report comment
    
    Reply
static says:

April 14, 2013 at 12:35 pm

Paper dated 1 April 2013, OK… Anyway a fun video to watch, never read the paper through to the end, and it’s probably in the delete line up when I want to recover every MB of storage possible. Dr John Doe Ph.D my earn demerits in some circles, I wouldn’t know because that’s not my world.

Report comment

Reply
1. Rakyth says:
  
  April 15, 2013 at 8:31 pm
  
  It’s a joke conference and the object of the paper of the joke, but as far as I can see the method and result is real.
  
  Report comment
  
  Reply
Kev says:

April 14, 2013 at 2:40 pm

Brilliant end sequence in the video there. Hilarious!

Report comment

Reply
oneguydid says:

April 14, 2013 at 3:34 pm

I figure this approach is sensible where obstacles and enemies are predictable, which would be the case where you can’t afford the memory nor processing time for sensible randomness. Tetris, however, can be random with little effort, making it impossible to repeat a game on demand.

Report comment

Reply
1. David says:
  
  April 14, 2013 at 9:11 pm
  
  I think the tetras problem is optimizing towards score not toward filling a row. as filling a row partially doesn’t increase score the system doesn’t try to do that. That could be because of how the grid is stored in the RAM.
  
  Report comment
  
  Reply
  1. rj says:
    
    April 15, 2013 at 12:53 am
    
    Yeah. I bet that without sufficient lookahead, and on the 1st level, the points gotten from clearing a single line (40 points) are insufficiently big in comparison to the points from soft drops (1 per line).
    
    It’d be idly interesting to hack Tetris’s scoring to make the search better. Maybe something like “you get N points for filling in the Nth column of a given row, plus something much larger for clearing each row”
    
    Report comment
    
    Reply
    1. Greenaum says:
      
      April 17, 2013 at 8:10 am
      
      The way to get big points in Tetris is to form a pile, then wait for the 4×1 pieces, giving you 4 lines at once. You get a big bonus for that. The other objective is to stay in the game, which might require making smaller groups of lines, and clearing up anry garbage pieces you couldn’t fit in.
      
      Tho organising your pile so you always have room for a new piece is important too. One bad-placed piece can send the whole thing spinning into oblivion as blockages pile up.
      
      Report comment
      
      Reply
2. Alex Rossie says:
  
  April 15, 2013 at 3:08 am
  
  You could probably make the emulator provide the same seed to the PRNG.
  
  Report comment
  
  Reply
Whale says:

April 14, 2013 at 7:05 pm

Sounds like a very interesting piece, but youtube is so laggy with loading nowadays, I can’t watch :(

Report comment

Reply
kmdes says:

April 14, 2013 at 9:05 pm

To explain the reason why Hudson puts their name in Adventure Island is to look up a game called Wonder Boy published by Sega. A thing to keep in mind is Wonder Boy came first. Surprisingly, this was done legally and more than once by the developer of Wonderboy.

Report comment

Reply
PHO says:

April 15, 2013 at 12:57 am

Guess Wonderboy changed his name while skateboarding over to Nintendo from Sega ….
This is a great example of making intresting research that could be boring fun…

Report comment

Reply
Galane says:

April 15, 2013 at 1:31 am

But will Twin Galaxies accept AI high scores?

Report comment

Reply
Speedy says:

April 15, 2013 at 2:20 am

Try this with “DIGGER” from Windmill Software. It will be good at this game i guess.

Report comment

Reply
emsi says:

April 15, 2013 at 3:08 am

You guys noticed it was published on 1 April 2013, right?

Report comment

Reply
emsi says:

April 15, 2013 at 4:33 am

I’m 100% sure no one read the paper ;) I was looking for something like reinforcement learning or something but instead I found footnotes like this : “possible additional simplication would be to just take lex-icographic orderings over bits, which then generalizes to 8-bitbytes. This is probably too crazy, but just right now I am sort of
feeling like maybe I should try it, though it may be the beer” :)
Nevertheless I’m having fun reading ;))))

Report comment

Reply
minipimmer says:

April 15, 2013 at 5:29 am

Congratulations to the Dr. PhD, this one is a nice piece of work, both funny and interesting.

Report comment

Reply
YS says:

April 15, 2013 at 11:53 am

Wow, really amazing approach!

Report comment

Reply
Joe Moer says:

April 15, 2013 at 1:12 pm

Hey! Let’s apply this to warfare! I wonder how many nukes we have to drop until we hit the target :p

Report comment

Reply
willrandship says:

April 16, 2013 at 8:30 am

I love this project. The video was a little bit off, but the work done is really impressive. Also source code is available (link in YT description to sourceforge) :D

Report comment

Reply
Sixten says:

April 16, 2013 at 9:54 am

“The only winning move is not to play”

Pause

Report comment

Reply

Hackaday

Teaching A Computer To Play Mario… Seemingly Through Voodoo

56 thoughts on “Teaching A Computer To Play Mario… Seemingly Through Voodoo”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

The DEW Line Remembered

The Fight To Save Lunar Trailblazer

Hacking When It Counts: DIY Prosthetics And The Prison Camp Lathe

Dearest C++, Let Me Count The Ways I Love/Hate Thee

Personal Reflections On Immutable Linux

Our Columns

FLOSS Weekly Episode 841: Drupal And AI: The Right Tool For Everything

Mach Cutoff: Bending The Sonic Boom

Robots Want The Jobs You Can’t Do

Hackaday Links: July 13, 2025

Trickle Down: When Doing Something Silly Actually Makes Sense

56 thoughts on “Teaching A Computer To Play Mario… Seemingly Through Voodoo”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns