AI-Powered Speaker Is A Chatbot You Can Actually Chat With

A small speaker with an LCD showing chatbot responses

AI-powered chatbots are pretty cool, but most still require you to type your question on a keyboard and read an answer from a screen. It doesn’t have to be like that, of course: with a few standard tools, you can turn a chatbot into a machine that literally chats, as [Hoani Bryson] did. He decided to make a standalone voice-operated ChatGPT client that you can actually sit next to and have a conversation with.

The base of the project is a USB speaker, to which [Hoani] added a Raspberry Pi, a Teensy, a two-line LCD and a big red button. When you press the button, the Pi listens to your speech and converts it to text using the OpenAI voice transcription feature. It then sends the resulting text to ChatGPT through its API and waits for its response, which it turns into sound again through the eSpeak speech synthesizer. The LCD, driven by the Teensy, shows the current status of the machine and also provides live subtitles while the machine is talking.

To spice up the AI box’s appearance, [Hoani] also added an LED ring which shows a spectrogram of the audio being generated. This small addition really makes the thing come alive, turning it into what looks like a classic Sci-Fi movie prop. Except that this one’s real, of course – we are actually living in the future, with human-like AI all around us.

All code, mostly written in Go, is freely available on [Hoani]’s GitHub page. It also includes a separate audio processing library called toot that [Hoani] wrote to help him interface with the micophone and do spectral analysis. Anyone with basic electronic skills can now build their own AI companion and talk to it – something that ham radio operators have been doing for a while.

20 thoughts on “AI-Powered Speaker Is A Chatbot You Can Actually Chat With

  1. I’d like to put a load of these speakers in a room, turn them all on and see what they talk about. I’d program each one with an intrinsic bias, for example speaker no.1 would be a white supremacist intent on destroying democracy, speaker no.2 would be a climate activist intent on saving the planet, speaker no.3 …… you get the picture!

        1. Although an AI might never get to the level of Donald´s NS (natural stupidity), the advantage being clearly much much less computing power needed for a realistic simulacrum.

    1. I like your idea. I’m envisioning the communications equivalent of thermal runaway occurring here. I’m guessing someone has already pitted one AI versus another in some fashion.

    1. Yes it is rubbish (I used that code a decade ago and I got used to it), however mbrola voices are much better, and if you have even more compute power you can use Larynx TTS

      Personally I’d not let ChatGPT traffic onto my home LAN and would look at self hosted options, even if they were “less capable”.

    2. It is. eSpeak has been going for almost a good 20 years now and is based on relatively old speech synth tech. I presume the choice was made because it’s an easy to run local TTS and the creator was specifically aiming for a retro stylisation.

Leave a Reply to tr5tyCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.