Audio Out Over UART

There’s a reason that the bog-standard serial port will never die. It’s just so robust and simple. When you need a console that will absolutely work with minimal software and hardware, UART is the way to go. Because of this, UART hacks abound. Here’s a new one to us, and a challenge to our readers.

[Tiziano Bacocco] decided to use UART signals as a type of PWM to create audio. That’s right, he’s plugging the serial TX line straight into a speaker. This gives you eight possible PWM output voltage levels. The trick is using some Python code (using the awesome pyserial module) to down-quantize the audio data to fit these eight possible values and then push them out at the correct sampling rate. ffmpeg is used to pre-process the files.

While it sounds good enough for Mario (see video below), three-bit audio isn’t going to sound great for speech or complicated music. So here’s our challenge: redo this with DPCM. Instead of sending the voltage levels, you send the required voltage changes over the serial line. Since voice and music sound waves are continuous (in the mathematical sense), this can work out pretty well even with just a few cleverly selected bits of output.

Here’s a tutorial to get you started. You’ll need to integrate the output (think RC filter), and you’ll want to run it into an amplifier, but we bet it can output decent speech. And we bet it can be done in a few lines of Python. Post your solution in the comments below.

Still, [Tiziano]’s hack is fantastic. We love the simplicity of just plugging the serial out straight into an amplifier! That’s hard core.

40 thoughts on “Audio Out Over UART

  1. Quote: “This gives you eight possible PWM output voltage levels”

    I tend to disagree as the target is analog and the input to the conversion stage can be considered a bit stream.

    Sure with a R/2R ladder you can assume some absolute voltages at least at the point of D to A but this project is relying on the input capacitor of the Amp to integrate the voltage and therefore the voltage at any one time is partially dependent on the previous bit stream.

    This project is using 8 bit’s, No parity and 1 stop bit. If you were to remove the stop bit then you could use quite accurate Bit Angle Modulation (BAM) however the chosen UART speed is a little low for audio this way. You can achieve quite high resolutions with BAM at very low CPU cycle costs.

    The other problem with this sort of design is that humans have logarithmic hearing so a linear output system is a poor match. BAM can easily be modified (timing wise) to give a LOG output. You can also make a LOG R ladder like R/2R but obviously not a ratio of simple “2R” and this will give the impression of much higher resolution. The conversion from linear to LOG can be done digitally quite easily but to do it “on the fly” might suggest that some FPGA would be useful.

    1. You could also use delta-sigma modulation, aka a “one-bit DAC”. Quality will still be limited since a 1-bit DAC depends on oversampling to get a decent noise level, and even at 115kb and an 11kHz audio rate, you’re only getting about 8:1 oversampling due to the start and stop bits, which is still equivalent to a 3-bit DAC in signal/noise ratio. But while conventional PWM that’s quantized to 3 bits sounds very distorted, and low-amplitude signals will just sound like silence, delta-sigma modulation reproduces even low-amplitude signals, albeit with a lot of noise.

    2. Using a R/2R ladder requires 8 inputs to the ladder (or 8 outputs from the computer RS232 device. I guess you could string together 8 RS232s) and “this gives 256 possible output levels.”
      In this post, Tiziano is using PWM (Pulse Width Modulation) through the 1 TX line of the RS232.
      The 8 bits of each byte he is sending, represents the 8 levels of voltage possible. So, there are nine portions of time for each pulse, one for each of the bits + one stop bit. Then he is trying to use a baud rate of 576000, so should get an “ok” level of quality.
      I guess “if” you can kill the stop bit, you could string two bytes together and have 15 output levels (you would need one bit to replace the stop bit.) This effectively halves the sample rate.
      (can the stop bit be killed in serial transmission through the RS232?)

    1. That may well be a great project but I can’t find anything that offers any legitimate explanation.

      It looks like someone has put a high bit rate Mirrored Bit Angle Modulation into a basic micro-controller and called it MAGIC while offering no explanation.

      It leave me to think “What possible use is this?”

      1. I don’t see anything MAGIC about this. We all know what a square wave is and how to subtract harmonics to get back to a fundamental. It’s just one extra step to use a predictable filter and mathematically adjust the bit stream to compensate for the characteristics of the filter stage.

        Well unless I am missing something??? but there is no MAGIC there as far as I can tell.

        In fact I don’t get the fuss at all.

        PCM was around in the 70’s.

        In the eighties people discovered (with CD players) that a 16 single bits DAC was better than a 16 bit R/2R DAC because you unwanted sample rate clock was 16 times higher in frequency and that made it so much easier to filter it out with simple 3dB or 6dB per octave filters.

        There is not much that easier than audio for micro-controllers as long as you can afford the interrupts.

        Human hearing is logarithmic which is the opposite to an exponent and computers work in exponents (of 2). DOH – no brainer there.

    1. I’ve never built one of these, but it really looks like it’s just 1-bit DPCM, except that the sample rate is slow enough that he needs to take the explicit capacitor charge/discharge time into account.

      (The methods with two resistors are a strange version of 2-bit DPCM as far as I can tell.)

      I’d really want to see it in a shoot-out with a more standard 1-bit DAC / n-bit DPCM setup before spending that much time with an oddball format.

  2. It seems like one could get better than 8 levels if you can set the baud rate high enough such that there are multiple byte times per whatever the highest frequency your system can produce. Eg, say the speaker’s inductance and inertial combine to produce a system which an do at best 15 KHz. At 115200 baud rate (one start, one stop, no parity) you have control over almost 8 bits per 15KHz cycle. But at a 921600 baud, you have control over 49 bits per cycle, giving you 5.6 bits of resolution. Etc.

  3. At first I thought, “why not just use a general-purpose I/O pin and toggle it directly?” But then I remembered that not all computers give you access to general-purpose I/O pins, and not all operating systems give you real-time capability. It IS intriguing that you could write a simple program in C or even Python, that can generate sound without any special hardware. As long as you can generate “characters” faster than the UART can spit them out, you get pseudo-realtime operation.

  4. He is using the serial port the wrong way: roughly 3 bit equivalent audio and start/stop|parity bits would pollute the output anyway. A better approach would be to use one of the output control lines (RTS, DTR etc.) instead of a data line (TXD) as a pwm output. The speaker electrical resistance paired with its mechanical inertia will act as a RC filter, ie an integrator, transforming the pulse widths into sound pressure. You can output any bit number, speed is the only limit, but access to control lines can be programmed directly by getting the appropriate permissions from the system (ioperm() under Linux, IIRC).

    1. Many small embedded devices do not export serial control lines. And the serial port can output 8 bits per i/o rather than just 1 that a control line would provide (if even available on a particular device). However, using the serial handshake lines (or any other GPIO output) could theoretically do much better IF you had realtime control and root access to such i/o pins. This serial hack is cool in that root access is not required, and serial i/o is pervasive in most devices (at least somewhere inside on the PCB)…

        1. Yup, true. Many old serial chips could be programmed to act as USARTs as well, not sure about the serial interfaces embedded into modern chips though. Anyway synchronous mode would require at least an external clock line; we’re soon going to reinvent i2c.

  5. Nah, not new at all. I quite sure I’ve seen somethung like it in the beginning of the 1990′. I got a disk full of shareware and and a programme for playing samples via a COM pory was there. The funny bit was the circuit diagram was in a file for an Epson printer. It was somewhat more complicated than R/2R (Covox) and quality of sound was lower.

    1. Sort of. There was a driver for Windows 3.1 that played samples through the speaker. You could optionally turn interrupts off, which meant literally everything froze while the sample played. Or you could leave them on, and hear them in the output, beeping and clicking away.

      The PC speaker though has a control port where you can toggle it’s bit as much as you like, it’s not limited to a certain bit rate like a serial port is.

        1. The win3.x sound driver also worked after a win95 upgrade, and I used it on a laptop back in the day, when computers did NOT have onboard audio (hence parallel port sound dongles).

        2. Nah, I had a Soundblaster. But the computers at college didn’t. Playing Beavis and Butt-Head samples off a floppy disk used to keep us entertained for hours. And if you told that to kids today, they wouldn’t believe you.

  6. So what exactly is off limits here?
    You could drop in a simple micro that can take the 1 line of UART, read the data and spit it out as 8bit PWM. Other improvements possible.

    Hell, if simple/low cost is the issue, i am sure a VUSB on a 2K micro could fit the uart to 8 bit pwm as well.

  7. Back in the early 1980s, we had a very fine program for the 6502-based Compukit UK101: 8k Space Invaders. It required a machine that had been expanded from the standard 4k to 8k by adding eight 2114 static RAM chips. But it had the added features of proper timing and sound, on a machine with no timer chip and no sound chip. People often overclocked the UK101’s 6502 from 1MHz to 2MHz, making CPU-timed games run too fast (the origin of the Turbo button on later PC clones). 8k Space Invaders used the 6850 serial chip to do both timing and sound generation. The UK101’s fixed baud rate of 300 baud was used for timing, by sending characters and waiting for the chip to be ready for the next one (no interrupts either). But by connecting a high-impedance crystal earpiece to the serial port, you could get sound effects too! The program sent different bytes to the serial port to get different sounds — just video-game sound effects, of course. It was crude, but it worked!

  8. Yes, using a serial port (or SPI) is a way to approximate a delta-sigma converter– an alternative to a simple timer-based PWM. Rather than having 128 consecutive on bits and 128 off bits for 50% value (e.g. 63KHz with 16MHz clock) the bits can be interleaved for an 8MHz signal. So the RC can have a higher cutoff frequency and/or lower noise at audio frequencies. The RPi processor has a bit interleaved PWM mode that does this automatically. In a microcomputer with DMA, it can be practical to send a bit stream this way, but gets difficult to multi-task without DMA.

    Tiziano’s bit coding is suboptimal though– should be something like 0x00, 01, 11, 29, 55, …, 77, FF spreading the distance between bits. To get more than 8 bits resolution, dither the lower or upper 1/8 value proportional to the fractional part, e.g. to represent 3/16, alternate 1/8 and 2/8 values. (A true delta-sigma converter would do the same.) A 32 byte buffer would have the same 0-255 resolution as usual PWM, but the 64KHz ripple only applies to the least significant bit (the msb would be 8MHz ripple). No need for DPCM.

    Using the FT232, the maximum baud rate is 3Mb (not 16 as in my example above), so 1/255 is 11KHz. From a PC, it’s practical to fill a USB buffer, though a compiled language (e.g. C) might be required. Since a start and stop bit is added, the UART is effectively 10 bits with 8 bits of range, i.e. instead of 0-8 on bits it is 1-9 on bits of 0-10, but it doesn’t matter if full scale low/high can’t be represented.

Leave a Reply to RÖBCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.