Teaching a computer to play Mario… seemingly through voodoo

computer-learning-mario

Some people know [Tom Murphy] as [Dr. Tom Murphy VII Ph.D.] and this hack makes it obvious that he earned those accolades. He decided to see if he could teach a computer to win at Super Mario Bros. But he went about it in a way that we’d bet is different that 99.9% of readers would first think of. The game doesn’t care about Mario, power-ups, or really even about enemies. It’s simply looking at the metrics which indicate you’re doing well at the game, namely score and world/level.

The link above includes his whitepaper, but we think you’ll want to watch the 16-minute video (after the break) before trying to tackle that. In the clip he explains the process in laymen’s terms which so far is the only part we really understand (hence the reference to voodoo in the title). His program uses heuristics to assemble a set of evolving controller inputs to drive the scores ever higher. In other words, instead of following in the footstep of Minesweeper solvers or Bejeweled Blitz bots¬†which play as a human would by observing the game space, his software plays the game over and over, learning what combinations of controller inputs result in success and which do not. The image to the right is a graph of it’s learning progress. Makes total sense, huh?

[via Reddit]

Comments

  1. dreamer says:

    Either you use Dr. or you use Ph.D.
    You can’t use both at the same time.

    NIce research otherwise, wonder how these kind of algorithms can/will be used in the future.

  2. This is a lot like other search problems in AI where we can’t exhaustively search the state tree, except the move results are handled by feeding the move to a ROM emulator. Cool to see it applied to emulated video games.

    Tricky bits are the choice of ordering / scoring heuristic, and the fact that he’s using emulator state saving to let it compute several potential moves from each move state without having to repeatedly traverse all the states leading up to it.

  3. Cronix says:

    this is sick lol

  4. Phroon says:

    Cleaver and the paper has some funny bits”:
    “On a scale of OMFG to WTFLOL I was like whaaaaaaat?”

    [Picture of computer playing Tetris, about to lose.]
    “Figure 16: Would you hire this construction company? Death is imminent, so playfun pauses the game shortly after this and then doesn’t unpause it.”

  5. Geebles says:

    Found that really interesting :)

  6. Daniel says:

    Interesting! His approach of looking at the overall size of the memory every 1/6th of a second reminds me of how physicists use energy methods to make complex problems relatively simple. I find it interesting that he only programmed it for one “play gradient”, and that it doesn’t set a gradient to play by from the example game it’s given as the first input.

  7. Cmh62 says:

    The “Dr. vs PhD” comment is best left for the author of the paper instead of Hackaday as this is how “Tom” referred to himself … check the link for the 22 page paper.

  8. Oguz286 says:

    Now THAT’s how you write a paper! Everyone… take notes!

  9. FrankenPC says:

    Fascinating. Brings up the possibility of using VR with goal oriented machine learning. I’m starting to see that AI ultimately is not going to be a single program calling the shots. It’s going to be one of these programs in greedy mode, one in cautious mode. A danger interrupt program that does care about the aesthetics. Perhaps a cloud oriented language interpreter, etc. It will be more like a hive mind.

  10. BayWatch says:

    OK, so from the perspective of a computer scientist, this was awesome to watch / read :) I congratulate you for the great method and the way you present it in. The fact that this works so often by using just analyzing RAM image sequences is remarkable. Imagine what would happen if you use some heuristics to basically validate your input before executing it. This will surely get unbeatable. A very rewarding experience, thanks to both Tom and HaD!

  11. Jayduey says:

    Anyone else thinking of ford prefect’s security robot buddy?
    I’m surprised noone has tried that, though this is basically the same thing.

  12. David says:

    It would be fascinating to see similar mathematics applied to 2 player games.
    Also I think your program plays better than I do.

    • Hirudinea says:

      It would be interesting to have two systems play each other at a relatively simple game (checkers?) and let them go at each other for a couple of months, at the end they should both be world class players.

  13. FourthDr says:

    Yes, Professor Falcon! Shall we play a game?………Strange game, the only winning move is not to play……LOL :-D

  14. S says:

    The interesing hack here is that he is scrapping the emulator ram contents of the game to grab the data to generate an heuristic where all the hard work is done, the greedy search used even if somehow works is not very good though

    The CS188.1x course offered by BerkeleyX thru edX that is just ending now (seems that will be offered again soon along the second part) teach AI learning algorithms using a Pacman implementation in Python with much much more advanced AI methods like Q-Learning.

    The course is really awesome and easy to follow btw, not obscure at all as the recomended book from S.Russell and P.Norvig Artificial Intelligence-A Modern Approach.

  15. Arjan says:

    Gotta love that reference to WarGames! :)

  16. t&p says:

    Keeping tetris paused and refusing to not loose points and get a game over is the best part. It’s like a kid that refuses defeat.

  17. Stem says:

    Cool bit of work for sure, but the way the guy addresses the audience really annoys me, could he not just cut the crap and get to the point rather than spending 6 minutes performing a routine?

  18. sevs says:

    Tom is a funny guy!

  19. vonskippy says:

    “I just like this graphic – it doesn’t matter what it means”.

    All too frequent in modern papers. It’s called the Desktop Publishing effect. Back in the day (when people walked to school in the snow uphill both ways) graphics took real effort to make, and therefore were used sparingly in papers. Now that it’s just a few clicks from DB or SS to Graph, they’re everywhere.

    Hard to say if that’s a good thing or not. It’s a picture is worth a thousand words versus hiding poor results in a pretty picture.

  20. Alex Rossie says:

    Relatively simple topic, awful explanation.

    • Alex Rossie says:

      Explanation:
      He played super mario bros and took 6 RAM snapshots per second and captured his key presses.
      A program analysed the RAM to see which memory locations go up (and considered that making them go up is doing well).
      Had a computer emulate the game along with key presses. Simulating 10-frames ahead for each move considered then picked the best input.

      Is what I understood from the whitepaper.

      And this takes an hour to generate 16 seconds of retarded mario playtime.

      Really uninteresting….

  21. static says:

    Paper dated 1 April 2013, OK… Anyway a fun video to watch, never read the paper through to the end, and it’s probably in the delete line up when I want to recover every MB of storage possible. Dr John Doe Ph.D my earn demerits in some circles, I wouldn’t know because that’s not my world.

  22. Kev says:

    Brilliant end sequence in the video there. Hilarious!

  23. oneguydid says:

    I figure this approach is sensible where obstacles and enemies are predictable, which would be the case where you can’t afford the memory nor processing time for sensible randomness. Tetris, however, can be random with little effort, making it impossible to repeat a game on demand.

    • David says:

      I think the tetras problem is optimizing towards score not toward filling a row. as filling a row partially doesn’t increase score the system doesn’t try to do that. That could be because of how the grid is stored in the RAM.

      • rj says:

        Yeah. I bet that without sufficient lookahead, and on the 1st level, the points gotten from clearing a single line (40 points) are insufficiently big in comparison to the points from soft drops (1 per line).

        It’d be idly interesting to hack Tetris’s scoring to make the search better. Maybe something like “you get N points for filling in the Nth column of a given row, plus something much larger for clearing each row”

        • Greenaum says:

          The way to get big points in Tetris is to form a pile, then wait for the 4×1 pieces, giving you 4 lines at once. You get a big bonus for that. The other objective is to stay in the game, which might require making smaller groups of lines, and clearing up anry garbage pieces you couldn’t fit in.

          Tho organising your pile so you always have room for a new piece is important too. One bad-placed piece can send the whole thing spinning into oblivion as blockages pile up.

    • Alex Rossie says:

      You could probably make the emulator provide the same seed to the PRNG.

  24. Whale says:

    Sounds like a very interesting piece, but youtube is so laggy with loading nowadays, I can’t watch :(

  25. kmdes says:

    To explain the reason why Hudson puts their name in Adventure Island is to look up a game called Wonder Boy published by Sega. A thing to keep in mind is Wonder Boy came first. Surprisingly, this was done legally and more than once by the developer of Wonderboy.

  26. PHO says:

    Guess Wonderboy changed his name while skateboarding over to Nintendo from Sega ….
    This is a great example of making intresting research that could be boring fun…

  27. Galane says:

    But will Twin Galaxies accept AI high scores?

  28. Speedy says:

    Try this with “DIGGER” from Windmill Software. It will be good at this game i guess.

  29. emsi says:

    You guys noticed it was published on 1 April 2013, right?

  30. emsi says:

    I’m 100% sure no one read the paper ;) I was looking for something like reinforcement learning or something but instead I found footnotes like this : “possible additional simpli cation would be to just take lex-icographic orderings over bits, which then generalizes to 8-bitbytes. This is probably too crazy, but just right now I am sort of
    feeling like maybe I should try it, though it may be the beer” :)
    Nevertheless I’m having fun reading ;))))

  31. minipimmer says:

    Congratulations to the Dr. PhD, this one is a nice piece of work, both funny and interesting.

  32. YS says:

    Wow, really amazing approach!

  33. Joe Moer says:

    Hey! Let’s apply this to warfare! I wonder how many nukes we have to drop until we hit the target :p

  34. willrandship says:

    I love this project. The video was a little bit off, but the work done is really impressive. Also source code is available (link in YT description to sourceforge) :D

  35. Sixten says:

    “The only winning move is not to play”

    Pause

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 93,647 other followers