Those of us who were around in the late 70s and into the 80s might remember the Speak & Spell, a children’s toy with a remarkable text-to-speech synthesizer. While it sounds dated by today’s standards, it was revolutionary for the time and was riding a wave of text-to-speech functionality that was starting to arrive to various computers of the era. While a lot of them used dedicated hardware to perform the speech synthesis, some computers were powerful enough to do this in software, but others were not quite able. The VIC-20 was one of the latter, but thanks to an ESP8266 it has been retroactively given this function.
This project comes to us from [Jan Derogee], a connoisseur of this retrocomputer, and builds on the work by [Earle F. Philhower] who ported the retro speech synthesis software known as SAM from assembly to C which made it possible to run on the ESP8266. Audio playback is handled on the I2S port, but some work needed to be done to get this to work smoothly since this port also handles the communication with the VIC-20. Once this was sorted out, a patch was made to be able to hear the computer’s audio as well as the speech synthesizer’s. Finally, a serial command interface was designed by [Jan] which allows for control of the module.
While not many of us have VIC-20s sitting at home, it’s still an interesting project that shows the broad scope of a small and inexpensive chip like the ESP8266 which would have had a hefty price tag back in the 1980s. If you have other 80s hardware laying around waiting to be put to work, though, take a look at this project which brings new vocabulary words to that old classic Speak & Spell.
Computer engineering student [sherwin-dc] had a rover project which required streaming video through an ESP32 to be accessed by a web server. He couldn’t find documentation for the standard camera interface of the ESP32, but even if he had it, that approach used too many I/O pins. Instead, [sherwin-dc] decided to shoe-horn a video into an I2S stream. It helped that he had access to an Altera MAX 10 FPGA to process the video signal from the camera. He did succeed, but it took a lot of experimenting to work around the limited resources of the ESP32. Ultimately [sherwin-dc] decided on QVGA resolution of 320×240 pixels, with 8 bits per pixel. This meant each frame uses just 77 KB of precious ESP32 RAM.
His design uses a 2.5 MHz SCK, which equates to about four frames per second. But he notes that with higher SCK rates in the tens of MHz, the frame rate could be significantly higher — in theory. But considering other system processing, the ESP32 can’t even keep up with four FPS. In the end, he was lucky to get 0.5 FPS throughput, but that was adequate for purposes of controlling the rover (see animated GIF below the break). That said, if you had a more powerful processor in your design, this technique might be of interest. [Sherwin-dc] notes that the standard camera drivers for the ESP32 use I2S under the hood, so the concept isn’t crazy.
We’ve covered several articles about generating video over I2S before, including this piece from back in 2019. Have you ever commandeered a protocol for “off-label” use?
Though kids today have an incredible knack for figuring out modern phones and tablets, there’s still something to be said for offering a simple physical user interface for little hands. To that end, [Martin Hierholzer] has put together a whimsical jukebox that his two year old daughter can use to listen to her favorite songs. With just a few simple buttons, no display to read, and the ability to stop and start songs using RFID tags embedded into 3D printed figures, it’s a perfect interface for tiny humans just getting the hang of interacting with technology.
While the Raspberry Pi might have been the more obvious choice to base this project around, [Martin] decided to go the ESP32 route for improved energy efficiency. The popular microcontroller is more than powerful enough to play MP3s, and its integrated WiFi connectivity allows the player to download new tracks from the network occasionally. He added a micro SD slot to provide some mass storage, a PCM5102 I2S DAC with a PAM8403 amplifier to handle the audio side of things, and a MFRC522 RFID receiver that can pick up tags placed on the top of the player. Power is provided by parts salvaged from a USB battery bank, and everything is housed on a custom PCB.
The relatively low power requirements of the ESP32 means the jukebox can keep the party going for many hours (perhaps even days) when in active use. When the RFID token is removed and there are no songs to play, some clever coding kicks the chip into low-power mode to greatly extend the player’s standby time. [Martin] says it can sleep for months without having to be recharged, and considering some of the impressive feats of battery-sipping we’ve previously seen from the ESP32, we don’t doubt it.
Even if you don’t have any young music lovers at home, the documentation [Martin] has put together for this project is absolutely worth a look. Whether its how he configures the server side to push songs and firmware updates to the player, how he wrangled the ESP32’s Ultra-Low Power coprocessor (ULP), or the woodworking tips used to produce the charming enclosure, you’re sure to pick up a trick or two.
Since we’ve started inviting them into our homes, many of us have began casting a wary eye at our smart speakers. What exactly are they doing with the constant stream of audio we generate, some of it coming from the most intimate and private of moments? Sure, the big companies behind these devices claim they’re being good, but do any of us actually buy that?
It seems like the most prudent path is to not have one of these devices, but they are pretty useful tools. So this hardware mute switch for an Amazon Echo represents a middle ground between digital Luddism and ignoring the possible privacy risks of smart speakers. Yes, these devices all have software options for disabling their microphone arrays, but as [Andrew Peters] relates it, his concern is mainly to thwart exotic attacks on smart speakers, some of which, like laser-induced photoacoustic attacks, we’ve previously discussed. And for that job, only a hardware-level disconnect of the microphones will do.
To achieve this, [Andrew] embedded a Seeeduino Xiao inside his Echo Dot Gen 2. The tiny microcontroller grounds the common I²S data line shared by the seven (!) microphones in the smart speaker, effective disabling them. Enabling and disabling the mics is done via the existing Dot keys, with feedback provided by tones sent through the Dot speaker. It’s a really slick mod, and the amount of documentation [Andrew] did while researching this is impressive. The video below and the accompanying GitHub repo should prove invaluable to other smart speaker hackers.
The demo code for [XTronical]’s ESP32-based SD card music player is not even 40 lines long, though it will also require a few economical parts before it all works. Nevertheless, making a microcontroller play MP3s (and other formats) from an SD card is considerably simpler today than it was years ago.
Part of what makes this all work is I2S (Inter-IC Sound), a format for communicating PCM audio data between devices. Besides the ESP32, at the heart of it all is an SD card reader breakout board and the MAX98357A, which can be thought of as a combination I2S decoder and Class D amplifier. The ESP32 reads audio files from the SD card and uses an I2S audio library to send the I2S data stream to the MAX98357A (or two of them for stereo.) From there it is decoded automatically and audio gets pumped though attached speakers.
It’s amazing how much easier audio is to work with when one can take advantage of shuffling audio data around digitally, and the decoder handles multiple formats with an amplifier built in. You can see [XTronical]’s ESP32 player in action in the video embedded below.
Of all the people I was looking forward to meeting at Supercon, aside from my Hackaday colleagues with whom I had worked for five years without ever meeting, was a fellow from Germany named Matthias Balwierz. The name might not ring a bell, but he’ll certainly be familiar to Hackaday readers as Bitluni, the sometimes goofy but always entertaining and enlightening face of “Bitluni’s Lab” on YouTube.
I’d been covering Bitluni’s many ESP32 hacks over the years, and had struck up a correspondence with him, swapping ideas and asking for advice on the many projects I start but somehow never finish. Luckily for us, Bitluni is far better on follow-through than I am, and he brought that breadth and depth of experience to the Design Lab stage for that venue’s last talk of the 2019 Superconference, before the party moved next door for the badge-hacking presentations.
Last month we marked the 40th birthday of the CD, and it was as much an obituary as a celebration because those polycarbonate discs are fast becoming a rarity. There is one piece of technology from the CD age that is very much still with us though, and it lives on in the standard for sending serial digital audio between chips. The protocol is called I2S and comes as a hardware peripheral on many microcontrollers. It’s a surprisingly simple interface that’s quite easy to work with and thus quite hackable, so it’s worth a bit of further investigation.
It’s A Simple Enough Interface
Don’t confuse this with the other Philips Semiconductor protocol: I2C. Inter-Integrated Circuit protocol has the initials IIC, and the double letter was shortened to come up with the “eye-squared-see” nomenclature we’ve come to love from I2C. Brought to life in 1982, this predated I2S by four years which explains the somewhat strange abbreviation for “Inter-Integrated Circuit Sound”.
The protocol has stuck around because it’s very handy for dealing with the firehose of serial data associated with high-quality digital audio. It’s so handy that you’ve likely heard of it being used for other purposes than audio, which I’ll get to in a little bit. But first, what does I2S actually do?