Neural Networks And MarI/O

Minecraft wizard, and record holder for the Super Mario World speedrun [SethBling] is experimenting with machine learning. He built a program that will get Mario through an entire level of Super Mario World – Donut Plains 1 – using neural networks and genetic algorithms.

A neural network simply takes an input, in this case a small graphic representing the sprites in the game it’s playing, sends that input through a series of artificial neurons, and turns that into commands for the controller. It’s an exceedingly simple neural network – the network that can get Mario through an entire level is less than a dozen neurons – but with enough training, even simple networks can accomplish very complex tasks.

To train the network, or weighting the connections between inputs, neurons, and outputs, [SethBling] is using an evolutionary algorithm. This algorithm first generates a few random neural networks, watches Mario’s progress across Donut Plains 1, and assigns a fitness value to each net. The best networks of each generation are combined, and the process continues for the next generation. It took 34 generations before MarI/O could finish the level without dying.

A few members of the Internet’s peanut gallery have pointed to a paper/YouTube video by [Tom Murphy] that generalized a completely different technique to play a whole bunch of different NES games. While both [SethBling]’s and [Tom Murphy]’s algorithms use certain variables to determine its own success, [Tom Murphy]’s technique works nearly automatically; it will play about as well as the training data it is given. [SethBling]’s algorithm requires no training data – that’s the entire point of using a genetic algorithm.

27 thoughts on “Neural Networks And MarI/O

      1. I think they mean running native code at faster than real time 50/60 frames per second? Most emulators are capable of doing this, even if it is rather unplayable for humans.

    1. If I’m reading the code right, while the number of “generations” is low, it actually simulates 300 individuals per generation, so the actual number of games simulated is a lot closer to what you were expecting.

  1. anyone who has the error mentioned above you need to load up a level first go file > save state and call it DP1.State then you need to make sure it is saved in the same location as the .lua file. its running fine for me i am going to leave it running overnight and see how far it gets.

  2. Hate to be “that guy”, but it’d be nice to test the evolutionary network on a few different levels. It seems like it’s just overfitting to this one particular level at the moment and would probably crash and burn on any slight modification to the level at this stage rather than having actually evolved any meta-abilities or abstractions. These would allow it to quickly re-enforce abstract “behaviours” (jumping, killing enemy type x, etc.) that would see it through different levels in a relatively small number of further generations.

    1. Can’t see the video right now although I guess I have the general idea.
      I’ve not played Mario for ages but as far as I remember, there are times where you need to be at a certain place at a certain moment, basically you need to have prior knowledge of some elements, at least that’s how human players play.
      Assuming the machine learning program only takes what’s on screen, how can it mimick that priori knowledge part.
      Also is machine playing the same game all the time? (rephrased: are there randomized elements in each game?).

  3. This phrase makes no sense: _”While both [SethBling]’s and [Tom Murphy]’s algorithms use certain variables to determine its own success, [Tom Murphy]’s technique works nearly automatically; it will play about as well as the training data it is given. [SethBling]’s algorithm requires no training data – that’s the entire point of using a genetic algorithm.”_
    Tom’s uses the cartridge’s raw RAM and optimizes for maximum values (for values that change). Pretty sure Seth’s neural net optimizes for specific values, AND BOTH need training data, which is the RAM in case of Tom’s and specific values in case of Seth’s.

      1. Oh, ok. Anyway, I’m not sure, but I think Tom’s need some human playing first to locate the changing data locations in RAM (that’s why his can play any game), not as training data. Seth’s setup already provides the controls and fitness variables to the GA.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s