ESP32, We Have Ways To Make You Talk

One of our favorite scenes from the [James Bond] franchise is the classic exchange between [Goldfinger] and [Bond]. [Connery] (the One True Bond) says, “You expect me to talk?” And the reply is, “No Mr. Bond, I expect you to die!” When it comes to the ESP32, though, apparently [XTronical] expects it to talk. He posted a library to simplify playing WAV files on the ESP32. There is also a video worth watching, below.

Actually, you might want to back up to his previous post where he connects a speaker via one of the digital to analog converters on the board. In that post, he just pushes out a few simple waveforms, but the hardware is the same setup he uses for playing the WAV files.

By wrapping up the WAV code in a library, [XTronical] makes the actual playback simple. Here’s the core of his simple example:

void loop() {
static uint32_t i=0; // simple counter to output
if(ForceWithYou.Completed) // if completed playing, play again
    DacAudio.PlayWav(&ForceWithYou); // play the wav (pass the wav class object created at top of code
    Serial.println(i); // print out the value of i
    i++; // increment the value of i

Not very hard, but, of course, the heavy lifting is hidden in the two objects PlayWav and ForceWithYou. The video explains how you can add more, but you can probably guess, too. The short version is he uses Audacity to prepare the WAV file and then a hex editor to get the bytes into an array. Since many of us use Linux or Cygwin, we might have been tempted to use od or hexdump, but however you do it, it has to wind up in an array.

If you want to experiment more with waveform generation, [Elliot Williams] did a good piece on that. You might also get some ideas from our signal generator.

24 thoughts on “ESP32, We Have Ways To Make You Talk

    1. These villain go through a lot of effort to organize some evil plot and without appreciation it’ll just feel incomplete.
      The whole world of internet social media is built on the concept, so it’s pretty universal.

    1. A talking wireless clock so you don’t need Nixie tubes. It’s the future! A reasonable library of numbers read by your favorite person might fit in internal flash, but I don’t know. Maybe you need to branch out to SPIFFS or something.

  1. Could the ESP32 be used to stream the array file and play it directly? I know this is wrapping it into a library, but that can’t possibly be the best implementation on a wireless platform.

  2. If you are going to use audacity to “prepare” wav file you might as well use it to prepare a raw file and play it with nodemcu. I built a grandfather clock chime with an 8266-01. It gets it’s time sync from my router and plays 1/2, 1/2. 3/4/ and the full Westminster chime on the quarter, half, three-quarter, and hour, and bongs out the hour’s just like a grandfather clock.

  3. There’s no need to convert the audio file into an array to be put in a .h file. The toolchain and makefiles does that automatically. Just put a reference to the file in the “” like this:

    COMPONENT_EMBED_FILES := sound.wav

    and then declare an array in the source like this:

    extern const uint8_t sound[] asm(“_binary_sound.wav_start”);


  4. Minor correction about the sample rate… it isn’t a measure of the number of “bytes”, it’s a measure of the number of samples… otherwise it’d be called a “byte rate”. We do sometimes measure quality in terms of data units per second; usually we call that the bit rate. Sample rate is just one variable that decides the bit rate for uncompressed audio.

    Bit rate = Sample Rate × bits per sample × channels

    So in the case of 8-bit mono audio, you indeed get sample rate = “byte rate”, but that won’t hold true if you, say, use 16-bit samples.

    As for minimum sample rate, a big factor will be the highest significant frequency present in the wave form.

    The bell sound effect there is one example where going to 4kHz sample rate really is not sufficient, as the highest frequency component is about 2.2kHz. If you sample that at 4kHz, you’ll get an alias of that component at 200Hz. Audacity will likely try and filter out that alias, which may completely change the sound you were wanting to reproduce.

    It’s wise to look at the spectrum analysis of the waveform of interest before deciding on a sample rate, pick the highest significant frequency, double it, then add some more to give yourself some transition band.

  5. Some places where arrays and streaming can be used effectively, true…but good write up either way.

    Guess who’s going to implement an audio alarm into the refrigerator sensors in the home automation….HEHEHE The spousal unit will be quiet upset. The food will be edible.

    1. Hopefully if you look at the latest version and what I use it for it was never designed to just be a simple digital sound player, that was just the start. See latest videos on this project

      DacAudio V4

      Frogger on ESP32

      It was initially designed to help me produce the sounds for writing games on the ESP32 which would not work with an MP3 player, not even sure if DMA access is an easy option for my needs.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.