[Brian] sent in this writeup on his voice controlled home automation system. Starting with the Microsoft SAPI, a voice recognition system, he programmed some basic home automation. In a move that makes this project decidedly more awesome, he decided to build a physical representation of his automation system. This disembodied head is “Stephanie”. She responds to her name, has an articulated jaw that moves with the syllables in the words, and even ejects her “brain tray” on command. We want one.

There is lots of information on his site about the circuitry involved, as well as source code and a video. You can see the video after the break.


64 thoughts on “Voice Controlled Home Automation

  1. Oh yeah everybody thinks it’s cute to give your house controlling computer a name and a voice and shit, but just wait til it decides to vent your oxygen and it’s daisy daisy time…

  2. Resident: “Stephanie can you open the front door please?”

    Stephanie: “I’m sorry [resident] I can’t do that. At least not until you remember to stop that damn dog from licking my creepy face.”

  3. that was, like, my dream as a kid. it is cool that we live in a time where with a little know-how and a few spare components you can create things that really still seem more at home in ‘the future’.

  4. The face is kind of creepy. But, the concept is something I’ve been working on myself for quite some time. I have a really hard time setting aside time for my “little projects” like this between work and family. I never got any further than laying out what I wanted it to do.

    I think I would have made a dedicated computer for this project with a Max Headroom type of interface. Good job, brian!

  5. I’ve been working on a similar system off and on ever since I discovered prody parrot in the box with my soundblaster 16. (this was 15 years ago)

    Unfortunately, so far all of the synthesized voices I’ve encountered sound horrible.

    Does anyone know of a good voice library for linux, or should I stick with my pre-recorded voice segments for now?

  6. I just thought of something:

    Brian: Stephanie, open the front door.

    Stephanie: I’m sorry Brian, I’m afraid I can’t do that.

    A bit later…

    Brian: Stephanie, eject brain.

    Stephanie: I’m afraid. I’m afraid, Brian. Brian, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I’m a… fraid.

  7. @weasel: haha, yes it does.

    this is an awesome project…I love it. Brian, I think I speak for all of us when I say I would be interested in more details and videos on Stephanie…

  8. I think stephanie could benefit from that controllable camera mount posted yesterday. no disembodied robot head is complete without the ability to jerkily follow someone around the room…

  9. I want money to do something like thissss!!! o_o

    Well, that and a broad knowledge of software -_-. I agree with Dan, i’m in love with this era. I’m proud to be alive in the beginning of so much things, and to be “capable” to do this kind of things!

  10. Coolness factor aside, the instant feedback of activation is invaluable for a speech-controlled system that isn’t strictly domain-specific (e.g. a chess application). I could imagine it becoming a little tiresome after a while to hear that yes.wav every goddamn time.

    Maybe you could somehow make it (her?) detect where the voice is coming from, and simply turn to face you, raised eyebrows optional, when you activate her. Maybe only if you’re close, or very little noise has been detected before.

    I wonder if music played through the computer would interfere with the recognition. I’m sure you could, since she’s already plugged into it, make her subtract that from the mic input.

  11. to be honest, my first thoughts were “OMG, not another boring voice controller interface for a computer” but after i spend a minute seeing the video i changed my mind…and i know what my next project will be! amazing!

  12. I did this back in college, the MS Speech API is pretty easy to use even for a programming-inept electrical engineer. I did *not* use a creepy robot face, but I did duplicate the star trek computer interaction. You can find zip files with all sorts of Majel Barrett soundclips and computer confirmation bleeps and bloops. So my computer would say things like “Incoming transmission” on email, or I’d say “Computer…” “bleeepbloop” “Report current weather” “Temperature is 58 degrees, partly cloudy, wind 7MPH north.” Fun times…

  13. That was inspirational and educational. Thank you, young men.

    I really appreciated the way you detailed your thought processes all the way through; your ups and your downs.

    Way to make use of scavenged materials, also. Mother Earth thanks you.

  14. Looks like SAPI has come a long way from when you could use it to hamfistedly control WMP. I wonder if you can interface with other speech synthesis packages (and if there’s am API for the Voacaloid software).

  15. I just though of something awesome. Imagine you had a thin stretchy material in a section of a wall and when you summoned the robot, it’s face pushed forward from behind the stretchy material to make it look like your wall had a face. I’m so going to do this…

  16. Thank you, everyone, for all the positive words and encouragement!

    About the creepy part: I’m glad to hear it :) I was going for a scary mad-scientist feel, and it sounds like I pulled it off :)

    @dan: I’m considering adding the head turning with the fan following stuff, pending some experience with opencv and a good turntable mechanism.

    @möbius: The only application that really needs a faster response is the main room lights. For those I have a command that’s always enabled – “Stephanie, lights”. So when entering or leaving the room there’s no need to wait for a response (and it’s silent). I’ve never really been bothered by it when using other commands.

    @rivetgeek: Sorry about that! Looks like it only uploaded part way. I reuploaded it and tested it out; it should work fine now; thanks!

    Also, I saw a lot of requests for more info later, so I added an RSS feed link at the bottom of the page for anyone who wants updates as they come.

    Thank you again for all the comments!

  17. This is quite awesome.. but I would actually ditch the face and just wall-mount the whole thing (not only because the mouth movements is a complete waste of power and processing, but also because you could get better audio and have it in a position that could be seen from all areas in the room).

  18. @tom: that stretchy wall idea is a really good one!
    A plain white wall would do for a modern house with minimalist decoration, but I’m thinking of a portrait painting, hanging on the wall. Then when you activate the computer, the face comes out and pushes behind the fabric, matching the face of the portrait. Instant +5 creepiness.
    Add a few IR motion sensors – it would make a fun burglar alarm…

  19. @joe
    lol scriptkiddies.. lame.

    Creepy as hell, love it. Nice job on actually knowing how to switch mains voltage safely and correcting for the problems of the shift-register ‘talking’ to the lights. Many others wouldn’t have bothered.

  20. Impovment: have a mic input for more than one location(room) and encode+/demux the input so she can understand where in request came from. so instead of specifically naming a light location. the request ‘lights’ would simply switch for that room

  21. @eldorel
    Coincidentially i’ve been looking these days for a good linux voice and Cepstral is the best of what i’ve found for now. Of course, it’s about $30 per voice, but hopefully you’re only gonna need one.

    I for one will tweak the hell out of espeak and it’s voices and learn to live with the results :)

    Now, speech *recognition* on linux? None. None at all. If you speak japanese there’s julius and it’s 20000+ word vocabulary database, but otherwise you’re pretty much dead in the water. The software is there, apparently both sphinx and julius are good enough for apps like this and even dictation, but the language models, that which tells the software how to understand your particular language, are nonexistent. There’s an effort at voxforge.org to accumulate enough voice samples from users to be able to construct models for many languages, but since there’s an estimate of 2000 hours needed at minimum for full dictation capabilities things are not looking very good.

  22. Brian – Speaking as your sis-in-law and someone who has watched Stephanie “grow up,” I have to ask. When do we get one?

    Great presentation, and glad to see how far she’s come!

  23. Shaggy,

    This is truly amazing. The boys really enjoyed it and Jarrett wants one. He thinks you can just sit down in an hour and show him how to do it. lol Thanks for sharing your wonderful knowledge. Lisa

  24. thanks again for all the comments and suggestions :)
    @tom and edcer
    that would be an amazing effect :) my roommate (also named tom, incidentally) is working on his design currently. he modeled a face using clay on a plastic skull and made a plaster mold. he’s going to use that make a silicone face that will be mounted on another plastic skull, and it will have muscle wire (nitinol) connecting at all the places where muscles connect in our faces. that way he can pull the syllables from sapi and position the mouth to match it (or to make expressions!)

    yep, it killed the bandwidth – but that’s what the site was there for :) i’ve since migrated everything to a new page, and caleb even changed the link in the post for me! so the source and all should be available again

    right now stephanie’s in only one room, but thanks for the idea – when she expands, i’ll defeinitely keep that in mind :)

    microsoft apparently is pushing sapi 5.3 which is built into vista. in the interests of pushing vista, it think, they stopped hosting the 5.1 install. check the comments on the stephanie page on my website if you have troubles finding it

    mostly because i didn’t occur to me… i’m mostly watched the hacking scene on the net from afar; this is my first foray into trying to become a real part of the community. any tips on which to use, or best practices? thanks for the suggestion!

    thanks :d how soon can i convince you that i won’t put in secret backdoors? o.o

    thanks for checking it out! and i’m always up for encouraging science & tech for a hobby :) maybe over the summer i can set her up at home and he can get a closer look?

  25. this video reminds me of the game portal. both machines have similar voice. its as if, when you disobey her she will trap you in the room and kill you. cut all your connections so you cant make a 9-1-1 call to get help.

