Christmas Comes Early With AI Santa Demo

With only two hundred odd days ’til Christmas, you just know we’re already feeling the season’s magic. Well, maybe not, but [Sean Dubois] has decided to give us a head start with this WebRTC demo built into a Santa stuffie.

The details are a little bit sparse (hopefully he finishes the documentation on GitHub by the time this goes out) but the project is really neat. Hardware-wise, it’s an audio-enabled ESP32-S3 dev board living inside Santa, running the OpenAI’s OpenRealtime Embedded SDK (as implemented by ExpressIf), with some customization by [Sean]. Looks like the audio is going through the newest version of LibPeer and the heavy lifting is all happening in the cloud, as you’d expect with this SDK. (A key is required, but hey! It’s all open source; if you have an AI that can do the job locally-hosted, you can probably figure out how to connect to it instead.)

This speech-to-speech AI doesn’t need to emulate Santa Claus, of course; you can prime the AI with any instructions you’d like. If you want to delight children, though, its hard to beat the Jolly Old Elf, and you certainly have time to get it ready for Christmas. Thanks to [Sean] for sending in the tip.

If you like this project but want to avoid paying OpenAI API fees, here’s a speech-to-text model to get you started.We covered this AI speech generator last year to handle the talky bit. If you put them together and make your own Santa Claus (or perhaps something more seasonal to this time of year), don’t forget to drop us a tip!

5 thoughts on “Christmas Comes Early With AI Santa Demo

  1. Neat… but the turn off for me is “and the heavy lifting is all happening in the cloud,” . Back to the drawing board for me :) . Got to be local or not at all at this house.

    1. You can likely connect it to things running ollama – you’ll just need a beefy GPU and a similar LLM running first. The “cloud” option skips those steps.

  2. I have been trying to buy enough GPUs to do this entire thing at home

    Sure I may only be able to run a 13B model, but if its text-to-speech, speech-to-text and a 13B LLM model all responding within milliseconds…that’s amazing!

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.