Audio Hacking With The ESP8266

If you study the specifications of the ESP8266 WiFi-enabled microcontroller, you will notice that it features an I2S audio interface. This is a high-speed serial port designed to deliver 16-bit audio data in a standard format, and has its origins in consumer audio products such as CD players. It would be usual to attach a dedicated DAC to an I2S interface to produce audio, but [Jan Ostman]’s synthesiser projects eschew that approach, and instead do the job in software. His I2S interface pushes out a pulse density modulated data stream in the same manner as a 1-bit DAC, meaning that the only external components required to produce audio are a simple low-pass filter. He’s posted a video of the synth in action, which we’ve placed below the break.

The example he gives us is a basic clone of a Roland 909 drum machine, and he takes us through the code with extensive examples including MIDI. He’s using the Wemos D1 Mini board, but the same could be replicated with many other ESP8266 platforms.

We’ve featured [Jan]’s work many times before, from his minimalist Atmel-based devices through to small but perfectly-formed complete instruments.

https://www.youtube.com/watch?v=lSmbs0qtvrMv

23 thoughts on “Audio Hacking With The ESP8266

      1. Are you comparing 5-bit PWM to PDM?
        Dont.

        5-bit PWM gives you a dynamic range of 30dB.
        16-bit PDM gives you 100dB range but harmonic distortion is dependent on the PDM bitrate.

        1-bit audio are in most consumer grade equipment a 3.6MHz bit stream.
        Professional PDM microphones use a 3MHz bitrate.

        My solution does use a bitrate of just 1.7MHz but you are free to bump up the samplerate to 96KHz and have a proffesional PDM rate.

        The biggest con is that it wont do stereo.

        1. I’d be OK with mono only. I have a need to set up multiple speakers from a single mic, across a room. Walls are solid concrete/rebar. Looking at BT, there’s the quality issue, and looking for a commercial VLF is expensive as all get out.

          So if I could get the audio ‘digitized’ at the mic’s receiver, then stream that out to multiple speakers, I’d be in really good shape to set up a couple of small line arrays. The whole thing could be on battery, to boot, which would make it even better.

          If you work more on this I’d love to follow it. Thanks much!

    1. Audio decoding and MIDI aren’t really comparable at all functionally, despite having the capability to sound similar on the output side. Not sure the 8266 has the oomph or RAM required to be both reading/downloading data and decoding MP3 on the fly, but I know @supersat got streaming audio working (barely) on his ESP32-based Defcon 25 Crypto Privacy Village badge. Code is open source and built in esp-idf.

      1. That github for the MP3 player was the source i found when i wanted to do exactly the same thing. I got the idea from a led string driver that used the i2s to generate the pulses for driving the string. That got me thinking that it must be possible to turn into a one bit DAC. But when i got deeper into the source i found that the original use for the code in the led-driver came from the mp3 player code.

        I used it to create a ding dong sound for a doorbel receiver with an old speaker, a mosfet and a esp8266. It calculates the ding dong in memory and i can also play other ding notes by mqtt request via a json array of frequencies.

        Maybe i should do a writeup of that on hackaday.io when i make my new esp32 doorbell buttons. I want to use the ULP in the ESP32 to have multiple doorbel buttons with different ringtones for every member in my house. The current one only has 1 button which resets an esp8266 that sends a MQTT message and goes to sleep again. It lasted 8 months and was pressed 645 times. It was running on 2 AAA batteries.

  1. This is awesome! I’m totally going to try this out and possibly build on it.

    Would the esp-32 offer more functionality/power? I feel that now it’s becoming more and more mainstream, it might be a better choice.

    Thoughts?

    1. The ESP32 wont in this case offer much more than the ESP8266.

      It has dual CPU cores and more SRAM but still a single DMA.
      But if you find a use for the RAM the ESP32 is better.

      Regarding clock frequency the ESP8266 can be overclocked to 320MHz without becoming instable.

      What overclocking does to the lifespan and wifi I have no idea.

  2. Yah I ran across this article a while ago…(definitely not something you’d want to listen to on a decent sound system/headphones, unless you’re going for chiptunes or something that sounds like an A][.)

    I tried to start a project to use an ESP8266 to do linear timecode for video cameras and given it’s a digital signal sent over a wire, I could bit bang. But life gets in the way. And a million other projects. And…

    If someone wants to motivate me to resume this project, I can see if I can find my dev board/setup. :D

    For a comparison, I could spend $290 for a “Tentacle Sync” device that does this…So $5 + some dev time seems more cost-effective.

  3. I have the problem that after soldering the rx pin to the output, i cannot flash the module anymore, when i disconnect it I can flash it, is there a way to enable both ?

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.