[carykh] took a dive into neural networks, training a computer to replicate Baroque music. The results are as interesting as the process he used. Instead of feeding Shakespeare (for example) to a neural network and marveling at how Shakespeare-y the text output looks, the process converts Bach’s music into a text format and feeds that to the neural network. There is one character for each key on the piano, making for an 88 character alphabet used during the training. The neural net then runs wild and the results are turned back to audio to see (or hear as it were) how much the output sounds like Bach.
The video embedded below starts with a bit of a skit but hang in there because once you hit the 90 second mark things get interesting. Those lacking patience can just skip to the demo; hear original Bach followed by early results (4:14) and compare to the results of a full day of training (11:36) on Bach with some Mozart mixed in for variety. For a system completely ignorant of any bigger-picture concepts such as melody, the results are not only recognizable as music but can even be pleasant to listen to.
The core of things is this character-based Recurring Neural Network which is itself the work of Andrej Karpathy. In his words, “it takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data.” How did [carykh] actually use this for music? With the following process:
- Gather source material (lots and lots of MIDI files of Bach pieces for piano or harpsichord.)
- Convert those MIDI files to CSV format with a tool.
- Tokenize and reformat that CSV data with a custom Processing script: one ASCII character now equals one piano key.
- Feed the RNN with the resulting text.
- Take the ouput of the RNN and convert it back to MIDI with the reverse of the process.
[carykh] shares an important question that was raised during this whole process: what was he actually after? How did he define what he actually wanted? It’s a bit fuzzy: on one hand he wants the output of the RNN to replicate the input as closely as possible, but he also doesn’t actually want complete replication; he just wants the output to take on enough of the same patterns without actually copying the source material. The processing of the neural network never actually “ends”; [carykh] simply pulls the plug at some point to see what the results are like.
Thanks to [Keith Olson] for the tip!