Once upon a time, computers didn’t really have enough resources to play back high-quality audio. It took too much RAM and too many CPU cycles and it was just altogether too difficult. Instead, they relied upon synthesizing audio from basic instructions to make sounds and music. [caiannello] has taken advantage of this with the WAV2VGM project.
The basic concept is straightforward enough—you put a WAV audio file into the tool, and it spits out synthesis instructions for the classic OPL3 sound card. The Python script only works with 16-bit mono WAV files with a 44,100 Hz sample rate.
Amazingly, check the samples, and you’ll find the output is pretty recognizable. You can take a song with lyrics (like Still Alive from Portal), turn it into instructions for an OPL3, and it’s pretty intelligible. It sounds… glitchy and damaged, but it’s absolutely understandable.
It’s a fun little retro project that, admittedly, doesn’t have a lot of real applications. Still, if you’re making a Portal clone for an ancient machine with an OPL3 compatible sound chip, maybe this is the best way to do the theme song? If you’re working on exactly that, by some strange coincidence, be sure to let us know when you’re done!
If I’m reading this right, the current code is just doing additive synthesis (explains why it’s using an OPL3). I wonder how well it could do if it had 2 a library of 2 or 4 operator options to fit to the fourier series?
Working on that now. Just pushed some pytorch AI stuff that is training a convolutional neural network as we speak- with fully-random OPL3 configurations. (Not working nearly as well as the simple method yet, though.)
Fitting to a fourier series can be done entirely computationally, even mechanically, no need for a neural network.
Even just a 2-operator channel has a lot of parameters to dink with:
channel-related ones: (frequency (13 bits), feedback level (3 bits), am/fm synth (1 bit))
Then, for each operator: (phase multiple (4 bits), key scale attenuation level (4 bits), overall attenuation level (6 bits), waveform selection (3 bits) )
Not to mention 4-op modes and the fact that all of the above is for just one out of the 12-18 channels.
You may be smart enough to make a formula for this, but I’m certainly not! I’m trying all I can just to do a little better than exhaustive search.
That is interesting! Would something like this also work, wave to midi?
https://chatgpt.com/share/6734706e-2f44-8008-a2a0-1106825dbf25
Cool, very cool indeed.
For those who are wondering if this is possible with the SID chip of the C64, the answer is “sort of”.
It can be done, it works, but quality is different as the SID chip only has 3 channels. An example of that can be found here: https://www.youtube.com/watch?v=oY-78oQ8hrw
More interesting perhaps is a Youtube channel of a coder “the algorithm” that tries to achieve low data bandwidth audio via various methods, a multiple sinewave synthesis method was also implemented but not good compared to the other methods. For those interested, take a look here: https://www.youtube.com/@thealgorithm/videos The whole idea is to play long audio files on a stock C64. The results are impressive considering the tight constraints of a stock C64.
For those interested in sampled sound of high quality (48KHz) on a C64 with plenty of cartridge based storage (i.o.w. better results by “cheating”) see: https://www.youtube.com/watch?v=UYAf_awh5XA
That is getting the SID to play samples, whereas the WAV2VGM is not playing samples but instead makes the sound chip play the synthesised notes that correspond with the frequencies of the original sample.
There’s “Hardware Accelerated Samples” for the C64, but I don’t think that format took off.
It’s pretty much the same method Ghostbusters and Impossible Mission used, but instead of 60 times per second would change the parameters much more often, but still seldom enough to allow lots of crazy raster effects to be played “at the same time”.
Now all you need to do is get a 16MB REU and listen to 10-15 minutes of music in Mahoney format. The player is called “Breadamp” (like Winamp for Breadbox)
This looks like a semitone per pixel, with each column being FM’ed. Very cool design, I would love to make a Python score generator for such a thing.