Teaching a computer to play Mario… seemingly through voodoo

computer-learning-mario

Some people know [Tom Murphy] as [Dr. Tom Murphy VII Ph.D.] and this hack makes it obvious that he earned those accolades. He decided to see if he could teach a computer to win at Super Mario Bros. But he went about it in a way that we’d bet is different that 99.9% of readers would first think of. The game doesn’t care about Mario, power-ups, or really even about enemies. It’s simply looking at the metrics which indicate you’re doing well at the game, namely score and world/level.

The link above includes his whitepaper, but we think you’ll want to watch the 16-minute video (after the break) before trying to tackle that. In the clip he explains the process in laymen’s terms which so far is the only part we really understand (hence the reference to voodoo in the title). His program uses heuristics to assemble a set of evolving controller inputs to drive the scores ever higher. In other words, instead of following in the footstep of Minesweeper solvers or Bejeweled Blitz bots¬†which play as a human would by observing the game space, his software plays the game over and over, learning what combinations of controller inputs result in success and which do not. The image to the right is a graph of it’s learning progress. Makes total sense, huh?

[via Reddit]

56 thoughts on “Teaching a computer to play Mario… seemingly through voodoo

  1. Either you use Dr. or you use Ph.D.
    You can’t use both at the same time.

    NIce research otherwise, wonder how these kind of algorithms can/will be used in the future.

    1. Umm, heard of humor? The “Tom Murphy VII” should have given it away.

      Re: Dr. Dr.: Yep, and in Germany you would be addressed as Herr Doktor Doktor Dreamer.

  2. This is a lot like other search problems in AI where we can’t exhaustively search the state tree, except the move results are handled by feeding the move to a ROM emulator. Cool to see it applied to emulated video games.

    Tricky bits are the choice of ordering / scoring heuristic, and the fact that he’s using emulator state saving to let it compute several potential moves from each move state without having to repeatedly traverse all the states leading up to it.

  3. Cleaver and the paper has some funny bits”:
    “On a scale of OMFG to WTFLOL I was like whaaaaaaat?”

    [Picture of computer playing Tetris, about to lose.]
    “Figure 16: Would you hire this construction company? Death is imminent, so playfun pauses the game shortly after this and then doesn’t unpause it.”

  4. Interesting! His approach of looking at the overall size of the memory every 1/6th of a second reminds me of how physicists use energy methods to make complex problems relatively simple. I find it interesting that he only programmed it for one “play gradient”, and that it doesn’t set a gradient to play by from the example game it’s given as the first input.

  5. The “Dr. vs PhD” comment is best left for the author of the paper instead of Hackaday as this is how “Tom” referred to himself … check the link for the 22 page paper.

  6. Fascinating. Brings up the possibility of using VR with goal oriented machine learning. I’m starting to see that AI ultimately is not going to be a single program calling the shots. It’s going to be one of these programs in greedy mode, one in cautious mode. A danger interrupt program that does care about the aesthetics. Perhaps a cloud oriented language interpreter, etc. It will be more like a hive mind.

  7. OK, so from the perspective of a computer scientist, this was awesome to watch / read :) I congratulate you for the great method and the way you present it in. The fact that this works so often by using just analyzing RAM image sequences is remarkable. Imagine what would happen if you use some heuristics to basically validate your input before executing it. This will surely get unbeatable. A very rewarding experience, thanks to both Tom and HaD!

  8. Anyone else thinking of ford prefect’s security robot buddy?
    I’m surprised noone has tried that, though this is basically the same thing.

  9. It would be fascinating to see similar mathematics applied to 2 player games.
    Also I think your program plays better than I do.

    1. It would be interesting to have two systems play each other at a relatively simple game (checkers?) and let them go at each other for a couple of months, at the end they should both be world class players.

      1. If theyre basically able to view raw ram and make decisions based off that in a fraction of a second, its basically a learning aim-bot of sorts.

  10. Yes, Professor Falcon! Shall we play a game?………Strange game, the only winning move is not to play……LOL :-D

  11. The interesing hack here is that he is scrapping the emulator ram contents of the game to grab the data to generate an heuristic where all the hard work is done, the greedy search used even if somehow works is not very good though

    The CS188.1x course offered by BerkeleyX thru edX that is just ending now (seems that will be offered again soon along the second part) teach AI learning algorithms using a Pacman implementation in Python with much much more advanced AI methods like Q-Learning.

    The course is really awesome and easy to follow btw, not obscure at all as the recomended book from S.Russell and P.Norvig Artificial Intelligence-A Modern Approach.

  12. Keeping tetris paused and refusing to not loose points and get a game over is the best part. It’s like a kid that refuses defeat.

  13. Cool bit of work for sure, but the way the guy addresses the audience really annoys me, could he not just cut the crap and get to the point rather than spending 6 minutes performing a routine?

  14. “I just like this graphic – it doesn’t matter what it means”.

    All too frequent in modern papers. It’s called the Desktop Publishing effect. Back in the day (when people walked to school in the snow uphill both ways) graphics took real effort to make, and therefore were used sparingly in papers. Now that it’s just a few clicks from DB or SS to Graph, they’re everywhere.

    Hard to say if that’s a good thing or not. It’s a picture is worth a thousand words versus hiding poor results in a pretty picture.

    1. Explanation:
      He played super mario bros and took 6 RAM snapshots per second and captured his key presses.
      A program analysed the RAM to see which memory locations go up (and considered that making them go up is doing well).
      Had a computer emulate the game along with key presses. Simulating 10-frames ahead for each move considered then picked the best input.

      Is what I understood from the whitepaper.

      And this takes an hour to generate 16 seconds of retarded mario playtime.

      Really uninteresting….

  15. Paper dated 1 April 2013, OK… Anyway a fun video to watch, never read the paper through to the end, and it’s probably in the delete line up when I want to recover every MB of storage possible. Dr John Doe Ph.D my earn demerits in some circles, I wouldn’t know because that’s not my world.

    1. It’s a joke conference and the object of the paper of the joke, but as far as I can see the method and result is real.

  16. I figure this approach is sensible where obstacles and enemies are predictable, which would be the case where you can’t afford the memory nor processing time for sensible randomness. Tetris, however, can be random with little effort, making it impossible to repeat a game on demand.

    1. I think the tetras problem is optimizing towards score not toward filling a row. as filling a row partially doesn’t increase score the system doesn’t try to do that. That could be because of how the grid is stored in the RAM.

      1. Yeah. I bet that without sufficient lookahead, and on the 1st level, the points gotten from clearing a single line (40 points) are insufficiently big in comparison to the points from soft drops (1 per line).

        It’d be idly interesting to hack Tetris’s scoring to make the search better. Maybe something like “you get N points for filling in the Nth column of a given row, plus something much larger for clearing each row”

        1. The way to get big points in Tetris is to form a pile, then wait for the 4×1 pieces, giving you 4 lines at once. You get a big bonus for that. The other objective is to stay in the game, which might require making smaller groups of lines, and clearing up anry garbage pieces you couldn’t fit in.

          Tho organising your pile so you always have room for a new piece is important too. One bad-placed piece can send the whole thing spinning into oblivion as blockages pile up.

  17. To explain the reason why Hudson puts their name in Adventure Island is to look up a game called Wonder Boy published by Sega. A thing to keep in mind is Wonder Boy came first. Surprisingly, this was done legally and more than once by the developer of Wonderboy.

  18. Guess Wonderboy changed his name while skateboarding over to Nintendo from Sega ….
    This is a great example of making intresting research that could be boring fun…

  19. I’m 100% sure no one read the paper ;) I was looking for something like reinforcement learning or something but instead I found footnotes like this : “possible additional simpli cation would be to just take lex-icographic orderings over bits, which then generalizes to 8-bitbytes. This is probably too crazy, but just right now I am sort of
    feeling like maybe I should try it, though it may be the beer” :)
    Nevertheless I’m having fun reading ;))))

  20. I love this project. The video was a little bit off, but the work done is really impressive. Also source code is available (link in YT description to sourceforge) :D

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s