Breadboard containing speech synthesis chip

RPi Python Library Has Retro Chiptunes And Speech Covered

September 9, 2021 by Dave Rowntree 4 Comments

The classic SP0256-AL2 speech chip has featured a few times on these pages, and if you’ve not seen the actual part before, you almost certainly have heard the resulting audio output. The latest Python library from prolific retrocomputing enthusiast [Nick Bild] brings the joy of the old chip to the Raspberry Pi platform, with an added extra trick; support for the venerable AY-3-8910 sound generator as well.

The SP0256-AL2 chip generates vaguely recognisable speech using the allophone system. Allophones are kind of like small chunks of speech audio which when reproduced sequentially, result in intelligible phonemes that form the basis of speech. The chip requires an external device to feed it the allophones at a regular rate, which is the job of his Gi-Pi library.

This speech synthesis technology is based on Linear-predictive coding, which is used to implement a human vocal tract model. This is the same coding method utilized by the first generation of GSM digital mobile phones, implementing a system known as Full-Rate. Both an LPC encoder and an LPC decoder are present on the handset. The LPC encoder takes audio in from the user, breaks it into the tiny constituent parts of speech, and then simply sends a code representing the audio block, but not the actual audio. Obviously there are a few more parameters sent as well to adjust the model at the receiving side. The actual decoding side is therefore not all that dissimilar to what the AY-3-8910 and related devices are doing, except you the user have to create the list of audio blocks up-front and feed the chip at the rate it demands.

Continue reading “RPi Python Library Has Retro Chiptunes And Speech Covered” →

Give Me A Minute, My Eyes Are Busy

September 11, 2020 by Brian McEvoy 3 Comments

Social cues are tricky, but humans are very good at detecting where someone is looking; that goes a long way toward figuring out where someone is placing their attention. All of this goes right out the window though, when you’re talking with somebody who uses eye-tracking software to speak. [Matthew Oppenheim] with Lancaster University, UK wants to give listeners the message of Give Me a Minute with an easy-to-recognize indicator. His choice is a microBit, which displays a rotating arrow on the LED array while someone composes their speech. He chose the microBit because they are readily available, and you can get cases to fit people’s personalities. After the break, you can see a demonstration, but the graphic appears scrambled because of the screen flicker. The rotating arrow is a clear indicator that someone is writing, whereas a clock might suggest a frozen computer, and a progress bar could not be accurate.

[Matthew] wrote a program for the interpreting computer which recognizes when a message is forming by monitoring the number of black pixels in the composition field. If it changes, someone must be composing a sentence. Many people will try to peek over the speaker’s shoulder and see if they are working, but we’re sure that most readers would join the users of such tech in being unhappy if someone blatantly looks at theirr computer screen while they are typing.

Wheelchairs don’t always have to come from a hospital or supply store, and they don’t have to stay on the ground.

Continue reading “Give Me A Minute, My Eyes Are Busy” →

38 Years Later, The Atari 2600 Learns To Speak

August 27, 2020 by Erin Pinheiro 11 Comments

Back in the early 1980s, there was a certain fad in making your computer produce something resembling human speech. There were several hardware solutions to this, adding voices to everything from automated telephone systems to video game consoles, all the way to Steve Jobs using the gimmick to introduce Macintosh to the world in 1984. In 1982, a software-based version of this synthesis was released for the Atari 8-bit line of computers, and ever since them [rossumur] has wondered whether or not it could run on the very constrained 2600.

Fast-forward 38 years and he found out that the answer was that yes, it was indeed possible to port a semblance of the original 1982 Software Automatic Mouth (or SAM) to run entirely on the Atari 2600, without any additional hardware. To be able to fit such a seemingly complicated piece of software into the paltry 128 bytes (yes, bytes) of RAM, [rossumur] actually uses an authoring tool in order to pre-calculate the allophones, and store only those in the ROM. This way, the 2600 alone can’t convert text to phonemes, but there’s enough space left for the allophones, which are converted into sound, that about two minutes of speech can fit into one cartridge. As for why he went through the trouble, we quote the author himself: “Because creating digital swears with 1982 speech synthesis technology on a 1977 game console is exactly what we need right now.”

For this project, [rossumur] has written an incredibly interesting article on speech synthesis in order to explain the SAM engine used here. And this isn’t his first time on the website either, always cramming software where it shouldn’t fit, such as a “Netflix”-like streaming service, or 8-bit console emulators, both on nothing but an ESP32 microcontroller. Check this one out in action after the break.

Continue reading “38 Years Later, The Atari 2600 Learns To Speak” →

Giving The Amstrad CPC A Voice And A Drum Kit

August 12, 2019 by Erin Pinheiro 7 Comments

Back in the ’80s, home computers weren’t capable of much in terms of audio or multimedia as a whole. Arguably, it wasn’t until the advent of 16-bit computers such as the Amiga that musicians could make soundtrack-quality music without having to plug actual studio gear up to their machines. [Michael Wessel] is trying to bring some of that and many more features to the Amstrad CPC with his ambitious LambdaSpeak 3 project, an expansion card built completely up from scratch and jam-packed with features.

First, and likely giving it its name, is the speech synthesizer. [Michael] has made an emulation mode where his card can act just like the original SSA-1 expansion, being able to be controlled by the same software as back then. By default, the card offers this mode with an Epson S1V30120 daughterboard (which is based on DECTalk synthesis), however for further authenticity you also have the option of fitting it with an SP0256-AL2 chip, the same one used in the original Amstrad hardware in 1985.

As for the more musical part of the project, the board supports 4-channel PCM playback, much like the Amiga’s sound offering. This can be used for a drum machine sequencer program, and it has an Amdrum mode, emulating another expansion from the original Amstrad days. Sample playback can also be used alongside the speech synthesis as shown here, with random allophone beats that wouldn’t sound out of place in a Kraftwerk recording. Finally, by using the UART interface included on the LambdaSpeak, you can also turn the CPC itself into a synth by giving it MIDI in/out and interfacing a controller in real time with the computer’s AY-3-8912 sound chip.

If you like modern expansions giving old computers new life, did you know that you can get just about any retro computer online, perhaps a TRS-80, an Amiga and even a Psion Organizer? And if you’re interested in just using old systems’ sound chips with modern USB MIDI controllers, it’s easy to make a microcontroller do all the heavy lifting.

Continue reading “Giving The Amstrad CPC A Voice And A Drum Kit” →

Vintage Speech Synthesizer Croons The Oldies

June 11, 2019 by Dan Maloney 33 Comments

If you listened to the National Weather Service Weather Radio in the US about 25 years ago, you’ll no doubt remember [Perfect Paul], one of the synthesized voices used to read current conditions and weather forecasts. The voice came from a DECtalk DTC01, a not inexpensive voice synthesizer first made in 1984 that also gave voice to [Stephen Hawking] for many years.

Long obsolete, the DECtalk boxes have a devoted following with hobbyists who like to stretch what the device can do. Some even like to make it sing, after a fashion, and [Michael] decided that making a DECtalk sing “Xanadu”, the theme song from the 1980 [Olivia Newton-John] musical extravaganza, was a good idea. Whether it actually was is debatable, and we’ll take exception with having that particular ditty stuck in our head as a result, but we don’t judge except on the merits of the hack.

It’s actually easy if you have a DECtalk; the song is a straight ASCII file with remarkably concise instructions on which phonemes the box needs to generate. Along with inflection, tone, and timing instructions, the text file looks almost completely unlike English while still somehow being readable. The DECtalk accepts the file over RS-232, which would be easy enough to do with a modern computer, but [Michael] upped his game a bit by using a TRS-80 Model 100 computer as a serial terminal. The synthesized song is in the video below, with the original included for reference by those who didn’t ~~experience~~ endure the late disco-era glory days.

DECtalks seem pretty rare in the wild, so we appreciate this glimpse at what they can do. There are other retro speech synthesizer hacks, though: the simulated walnut goodness of the Votrax and the MicroVox come to mind, as does the venerable TI Speak and Spell.

Continue reading “Vintage Speech Synthesizer Croons The Oldies” →

Scientists Create Speech From Brain Signals

April 25, 2019 by Al Williams 14 Comments

One of the things that makes us human is our ability to communicate. However, a stroke or other medical impairment can take that ability away without warning. Although Stephen Hawking managed to do great things with a computer-aided voice, it took a lot of patience and technology to get there. Composing an e-mail or an utterance for a speech synthesizer using a tongue stick or by blinking can be quite frustrating since most people can only manage about ten words a minute. Conventional speech averages about 150 words per minute. However, scientists recently reported in the journal Nature that they have successfully decoded brain signals into speech directly, which could open up an entirely new world for people who need assistance communicating.

The tech is still only lab-ready, but they claim to be able to produce mostly intelligible sentences using the technique. Previous efforts have only managed to produce single syllables, not entire sentences.

Continue reading “Scientists Create Speech From Brain Signals” →

Google’s Duplex AI Has Conversation Indistinguishable From Human’s

May 10, 2018 by Steven Dufresne 106 Comments

First Google gradually improved its WaveNet text-to-speech neural network to the point where it sounds almost perfectly human. Then they introduced Smart Reply which suggests possible replies to your emails. So it’s no surprise that they’ve announced an enhancement for Google Assistant called Duplex which can have phone conversations for you.

What is surprising is how well it works, as you can hear below. The first is Duplex calling to book an appointment at a hair salon, and the second is it making reservation’s with a restaurant.

Note that this reverses the roles when talking to a computer on the phone. The computer is the customer who calls the business, and the human is on the business side. The goal of the computer is to book a hair appointment or reserve a table at a restaurant. The computer has to know how to carry out a conversation with the human without the human knowing that they’re talking to a computer. It’s for communicating with all those businesses which don’t have online booking systems but instead use human operators on the phone.

Not knowing that they’re talking to a computer, the human will therefore speak as it would with another human, with all the pauses, “hmm”s and “ah”s, speed, leaving words out, and even changing the context in mid-sentence. There’s also the problem of multiple meanings for a phrase. The “four” in “Ok for four” can mean 4 pm or four people.

The component which decides what to say is a recurrent neural network (RNN) trained on many anonymized phone calls. The input is: the audio, the output from Google’s automatic speech recognition (ASR) software, and context such as the conversation’s history and the parameters of the conversation (e.g. book places at a restaurant, for how many, when), and more.

Producing the speech is done using Google’s text-to-speech technologies, Wavenet and Tacotron. “Hmm”s and “ah”s are inserted for a more natural sound. Timing is also taken into account. “Hello?” gets an immediate response. But they introduce latency when responding to more complex questions since replying too soon would sound unnatural.

There are limitations though. If it decides it can’t complete a task then it hands the conversation over to a human operator. Also, Duplex can’t handle a general conversation. Instead, multiple instances are trained on different domains. So this isn’t the singularity which we’ve talked about before. But if you’re tired of talking to computers at businesses, maybe this will provide a little payback by having the computer talk to the business instead.

On a more serious note, would you want to know if the person you were speaking to was in fact a computer? Perhaps Google should preface each conversation with “Hi! This is Google Assistant calling.” And even knowing that, would you want to have a human conversation with a computer, knowing that it’s “um”s were artificial? This may save time for the person whom the call is on behalf of, but the person being called may wish the computer would be a little more computer-like and speak more efficiently. Let us know your thoughts in the comments below. Or just check out the following Google I/O ’18 keynote presentation video where all this was announced.

Continue reading “Google’s Duplex AI Has Conversation Indistinguishable From Human’s” →

Hackaday

speech synthesis

28 Articles

RPi Python Library Has Retro Chiptunes And Speech Covered

Give Me A Minute, My Eyes Are Busy

38 Years Later, The Atari 2600 Learns To Speak

Giving The Amstrad CPC A Voice And A Drum Kit

Vintage Speech Synthesizer Croons The Oldies

Scientists Create Speech From Brain Signals

Google’s Duplex AI Has Conversation Indistinguishable From Human’s

Search

Never miss a hack

If you missed it

Back To The Future, 40 Years Old, Looks Like The Past

Why The Latest Linux Kernel Won’t Run On Your 486 And 586 Anymore

One Laptop Manufacturer Had To Stop Janet Jackson Crashing Laptops

The 2025 Iberian Peninsula Blackout: From Solar Wobbles To Cascade Failures

Field Guide To The North American Weigh Station

Our Columns

FLOSS Weekly Episode 839: I Want To Get Paid Twice

South Korea Brought High-Rise Fire Escape Solutions To The Masses

C++ Encounters Of The Rusty Zig Kind

Data Visualization And Aggregation: Time Series Databases, Grafana And More

Hackaday Links: June 29, 2025

Search

Never miss a hack

Subscribe

If you missed it

Our Columns