AI-powered chatbots are pretty cool, but most still require you to type your question on a keyboard and read an answer from a screen. It doesn’t have to be like that, of course: with a few standard tools, you can turn a chatbot into a machine that literally chats, as [Hoani Bryson] did. He decided to make a standalone voice-operated ChatGPT client that you can actually sit next to and have a conversation with.
The base of the project is a USB speaker, to which [Hoani] added a Raspberry Pi, a Teensy, a two-line LCD and a big red button. When you press the button, the Pi listens to your speech and converts it to text using the OpenAI voice transcription feature. It then sends the resulting text to ChatGPT through its API and waits for its response, which it turns into sound again through the eSpeak speech synthesizer. The LCD, driven by the Teensy, shows the current status of the machine and also provides live subtitles while the machine is talking.
To spice up the AI box’s appearance, [Hoani] also added an LED ring which shows a spectrogram of the audio being generated. This small addition really makes the thing come alive, turning it into what looks like a classic Sci-Fi movie prop. Except that this one’s real, of course – we are actually living in the future, with human-like AI all around us.
All code, mostly written in Go, is freely available on [Hoani]’s GitHub page. It also includes a separate audio processing library called toot that [Hoani] wrote to help him interface with the micophone and do spectral analysis. Anyone with basic electronic skills can now build their own AI companion and talk to it – something that ham radio operators have been doing for a while.
What is my purpose?
I’d make a hal9000 front end for that…
I’m afraid I can’t let you do that, Andy.
I just now checked – tell ChatGPT to respond in the manner of Talky Toaster, and it will do so.
I’d like to put a load of these speakers in a room, turn them all on and see what they talk about. I’d program each one with an intrinsic bias, for example speaker no.1 would be a white supremacist intent on destroying democracy, speaker no.2 would be a climate activist intent on saving the planet, speaker no.3 …… you get the picture!
Plot twist: they’re all no.1!
We’ve already got echo chambers on social media where extremists talk to each other and, largely, nobody else.
And not just extremists.
It’s already been going on for months now, in a somewhat narrowed context. See(/hear) https://infiniteconversation.com/ for a perpetual back-and-forth between an AI-generated Bavarian director Werner Herzog and an AI-generated Slovenian philosopher Slavoj Žižek.
Cool. But I want to see one between some more recognisable / prominent characters eg Donald and Greta.
… talking about climate change.
Although an AI might never get to the level of Donald´s NS (natural stupidity), the advantage being clearly much much less computing power needed for a realistic simulacrum.
And above, for your entertainment is an example of ono’s NS on display…
I like your idea. I’m envisioning the communications equivalent of thermal runaway occurring here. I’m guessing someone has already pitted one AI versus another in some fashion.
wake word detection when ??
It is really cool, but I thought speech generation in 2023 was better than that…
Come on. R2D2 could only manage a series of bleeps.
Yes, but that was a long time ago…
Yes it is rubbish (I used that code a decade ago and I got used to it), however mbrola voices are much better, and if you have even more compute power you can use Larynx TTS
Personally I’d not let ChatGPT traffic onto my home LAN and would look at self hosted options, even if they were “less capable”.
It is. eSpeak has been going for almost a good 20 years now and is based on relatively old speech synth tech. I presume the choice was made because it’s an easy to run local TTS and the creator was specifically aiming for a retro stylisation.
better mycroft?