Transferring Audio To An AVR At 12kbps

Back in the bad ‘ol days of computing, hard drives cost as much as a car, and floppy drives were incredibly expensive. The solution to this data storage problem offered by all the manufacturers was simple – an audio cassette. It’s an elegant solution to a storage problem, and something that has applications today.

[Jari] was working on a wearable message badge with an 8-pin ATTiny. To get data onto this device, he looked at his options and couldn’t find anything good; USB needs two pins and the firmware takes up 1/4 of the Flash, UART isn’t available on every computer, and Bluetooth and WiFi are expensive and complicated. This left using audio to send digital data as the simplest solution.

[Jari] went through a ton of Wikipedia articles to figure out the best modulation scheme for transferring data with audio. What he came up with is very simple: just a square wave that’s changed by turning a pin off and on. When the audio is three samples long without crossing zero, the data is 0. When it’s five samples long without crossing zero, the data is 1. There’s a 17-sample long sync pulse, and with a small circuit that acts as a zero crossing detector, [Jari] had a simple circuit that would transfer data easily and cheaply.

All the code for this extremely cheap modem is available on GitHub.

20 thoughts on “Transferring Audio To An AVR At 12kbps

  1. I have done a rather interesting experiment of embedding a 22.05kHz (half of 44.1kHz sample rate) tone in game audio samples, then using a combination of a high pass filter and a 567 tone decoder to detect when that tone is being played. Then use the output of the 567 to drive a MOSFET and the possibilities are nearly endless. I tried having it fire a camera flash whenever I get shot at (cool effect, but the rearm time spoils it during intense action), as well as using a vibrator attached to the chair (also interesting, but then you might as well embed a low frequency rumble and use a subwoofer). Best of all, the 22.05kHz tone is inaudible so you’ll never know it’s there.

    I really do think ultrasonic communication using existing hardware is a good way to provide NFC-like functionality without the cost of actual NFC hardware.

    1. “Best of all, the 22.05kHz tone is inaudible so you’ll never know it’s there.”

      Not to everyone, and not in every situation because it interferes with other sounds.

      Many people can “feel” it rather than hear it. I can tell when monitor backlights are on/off because they emit this sort of noise wall that can be felt with your head in a specific direction. It’s like when you walk in front of an open empty cupboard, or put a seashell near your ear and the soundscape changes.

      1. When I injected too much of the tone, it did make the sound “crunchy” and distorted in a very weird manner as it caused clipping. But since normal audio content has little energy in that part of the spectrum, I can set the level quite low and still have it work. I have tried it as low as -30dBFS and it just barely works, so I use -20dBFS to get a little more margin. It could possibly work better if I used a DSP to process “bit perfect” S/PDIF (after accounting for the game applying volume and phase shifts to simulate directionality) and possibly go even lower.

        1. “Distortion” can be an effect of harmonics being created if you use a rectangle waveform, which is likely if your main sampling rate is 44.1kHz and your carrier tone is 22.05kHz, since you only get “on/off” values then anyway.
          Such harmonics can, depending on the noise-sampling of the playback system, spread over various frequencies and, again depending on what the playback does to compensate for bad high frequency audability (i.e. mixing in sub-harmonics to high frequencies to make them stand out more clearly), result in quite some “hiss” and “blu” throughout the spectrum.

          1. I have experienced these issues. In my case I wasn’t transmitting data, but keeping an FM transmitter operating that cut off automatically after a few seconds if there was no audio; which annoyingly caused all receivers to blast static. So in a separate player, I played an inaudible tone continuously in a loop, with just enough amplitude to keep the transmitter active. But whether ultra- or infra-sonic, I never found one that didn’t produce some form of audible distortion, with at least some music. I eventually dissected the player, found a MCU with a pin responsible for audio detection, and cut the trace to isolate it from audio. I was prepared to attach it to a continuous signal source (like a 555), but jumpering it to VCC through a resistor was sufficient.

  2. Excellent work on design engineering for the OH.

    @ Mike Lu: NFC Vs Audio is indeed a worthy comparison/diversity. We often used the Ultrasonic Spectrum and even some Audibles for multiple “Control Hacks”.. DTMF as an example of how ubiquitous tone=control can be. We can go back to the various Tone Squelch systems “PL” or Channel Guard” for Voice Comms radio as precedent:s for at-a distance work. There were hacks of using a rotary dial as in telephones- to pulse the PL for remote controls such as gate opening and even crude ancestors of SCADA. We also had several SELCAL type systems that used dial pulsed tones. Grinning at the memories of just how far we’d pushed dizzying stacks of PL/DTMF/Pulsed Carrier etc for amusing and serious work. Much descended from CTCSS.

    http://en.wikipedia.org/wiki/Continuous_Tone-Coded_Squelch_System

    As for NFC alternatives&or replacements? I’d contemplated interleaving the realms… Initial daydream hack was a drop dead simple few inaudible digits “Beaconing” and getting an audio handshake response thence moving to an RF realm Among the list of important “reasons” for multiple realms?

    We can apply it- Audio+RF as raising the bars for exploit mitigation. Lest we forget Hacks of directional RF antennas etc as reasons FOR additional security. Requiring the multifactorial of Local Audio AND local RF inherently can thwart much malice:> And yeah- dear me. This post may have blocked Patent Trolling on an Open Source *Real Soon Now* of mine.

    Inaudibles as such also can be a contemporary I/O vector for various evil uses of their own. One being the still contentious BadBios… that in theory uses such methods. So being competent. in this realm is now more than ever relevant..

    Never Forget, We Hackers can be the worst nightmare to wannabe exploiters.

      1. The UART protocol guarantees at most 9 identical bits in row. Given a reasonable UART speed, and a 20-20000 Hz audio range, that should work just fine.

        I realize that the ATTiny doesn’t have a UART, but most other microcontrollers do, so it can still be a useful trick.

  3. A brief Google search looks like the Dallas Semiconductor 1-Wire protocol can also run on ATtinys. (Technically, it’s three wires, but that’s signal plus power and ground; it’s like a slower I2C without the clock wire.)
    Or you can sometimes get another wire by using the “steal a pin that you don’t need during software uploads” trick.

    But 12 kbps is pretty good – there are obviously faster modem protocols, but you’re fitting this into such minimal resources that you wouldn’t have room for them, so you’ve got something pretty cool. (There are also slower modem protocols – I’m reminded of the record “300 8N1” that came out in the 80s and played audio into a 300-baud modem to show ASCII.)

    1. Yes, the code size and simplicity was one of the main goals, as the product I needed this for also uses 2kB of program flash for user data – I didn’t want to waste code space.

      My original goal was to get around 4kbps speed, I was bit amazed myself that the same code did run flawlessly with 12kbps speeds.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.