Voice Controlled Home Automation

February 14, 2009

stephanie

[Brian] sent in this writeup on his voice controlled home automation system. Starting with the Microsoft SAPI, a voice recognition system, he programmed some basic home automation. In a move that makes this project decidedly more awesome, he decided to build a physical representation of his automation system. This disembodied head is “Stephanie”. She responds to her name, has an articulated jaw that moves with the syllables in the words, and even ejects her “brain tray” on command. We want one.

There is lots of information on his site about the circuitry involved, as well as source code and a video. You can see the video after the break.

[youtube=http://www.youtube.com/watch?v=DqCXbP85oX0]

64 thoughts on “Voice Controlled Home Automation”

Tachyon says:

February 14, 2009 at 7:03 am

Oh yeah everybody thinks it’s cute to give your house controlling computer a name and a voice and shit, but just wait til it decides to vent your oxygen and it’s daisy daisy time…

Report comment

Reply
localroger says:

February 14, 2009 at 7:18 am

That is honestly way cooler than I expected.

Report comment

Reply
Maddprof says:

February 14, 2009 at 7:36 am

Resident: “Stephanie can you open the front door please?”

Stephanie: “I’m sorry [resident] I can’t do that. At least not until you remember to stop that damn dog from licking my creepy face.”

Report comment

Reply
dan says:

February 14, 2009 at 7:50 am

that was, like, my dream as a kid. it is cool that we live in a time where with a little know-how and a few spare components you can create things that really still seem more at home in ‘the future’.

Report comment

Reply
mike says:

February 14, 2009 at 8:13 am

As cool as that it is, it’s creepy as hell. Face looks kind of like a safeguard from Blame!. Not something I want controlling my room.

Report comment

Reply
Josh says:

February 14, 2009 at 8:17 am

The face is kind of creepy. But, the concept is something I’ve been working on myself for quite some time. I have a really hard time setting aside time for my “little projects” like this between work and family. I never got any further than laying out what I wanted it to do.

I think I would have made a dedicated computer for this project with a Max Headroom type of interface. Good job, brian!

Report comment

Reply
eldorel says:

February 14, 2009 at 8:39 am

I’ve been working on a similar system off and on ever since I discovered prody parrot in the box with my soundblaster 16. (this was 15 years ago)

Unfortunately, so far all of the synthesized voices I’ve encountered sound horrible.

Does anyone know of a good voice library for linux, or should I stick with my pre-recorded voice segments for now?

Report comment

Reply
cyrozap says:

February 14, 2009 at 8:42 am

This reminds me of:
-GLaDOS from the video game “Portal”
-V.I.K.I. from the movie “I, Robot”
-HAL 9000 from the book and movie “2001 A Space Odyssey”

… and they all became evil.

Report comment

Reply
weasel says:

February 14, 2009 at 8:43 am

This definitely beats a cell phone in a shoe

Report comment

Reply
Claymore says:

February 14, 2009 at 8:47 am

Looks like jenova’s head!

Report comment

Reply
cyrozap says:

February 14, 2009 at 8:51 am

I just thought of something:

Brian: Stephanie, open the front door.

Stephanie: I’m sorry Brian, I’m afraid I can’t do that.

A bit later…

Brian: Stephanie, eject brain.

Stephanie: I’m afraid. I’m afraid, Brian. Brian, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I’m a… fraid.

Report comment

Reply
aficionado says:

February 14, 2009 at 8:59 am

nice.

as long as there is a command so it doesn’t become self-aware i would do this

Report comment

Reply
fractalrock says:

February 14, 2009 at 9:22 am

@weasel: haha, yes it does.

this is an awesome project…I love it. Brian, I think I speak for all of us when I say I would be interested in more details and videos on Stephanie…

Report comment

Reply
dan says:

February 14, 2009 at 9:44 am

I think stephanie could benefit from that controllable camera mount posted yesterday. no disembodied robot head is complete without the ability to jerkily follow someone around the room…

Report comment

Reply
Nyarlathotep says:

February 14, 2009 at 10:50 am

That’s it, you’ve doomed us all. With this invention robot apocalypse is on its way. I just have one thing to say, that is way too cool.

Report comment

Reply
Reikaze says:

February 14, 2009 at 10:52 am

I want money to do something like thissss!!! o_o

Well, that and a broad knowledge of software -_-. I agree with Dan, i’m in love with this era. I’m proud to be alive in the beginning of so much things, and to be “capable” to do this kind of things!

Report comment

Reply
Noilly says:

February 14, 2009 at 11:16 am

This is great. It shows how close we are getting to Disney’s home of the future.

Report comment

Reply
bob says:

February 14, 2009 at 11:46 am

this dude is so single…

Report comment

Reply
sasha says:

February 14, 2009 at 11:48 am

i robot is here….omfg!

Report comment

Reply
Möbius says:

February 14, 2009 at 11:53 am

Coolness factor aside, the instant feedback of activation is invaluable for a speech-controlled system that isn’t strictly domain-specific (e.g. a chess application). I could imagine it becoming a little tiresome after a while to hear that yes.wav every goddamn time.

Maybe you could somehow make it (her?) detect where the voice is coming from, and simply turn to face you, raised eyebrows optional, when you activate her. Maybe only if you’re close, or very little noise has been detected before.

I wonder if music played through the computer would interfere with the recognition. I’m sure you could, since she’s already plugged into it, make her subtract that from the mic input.

Report comment

Reply
THeOReos says:

February 14, 2009 at 12:50 pm

to be honest, my first thoughts were “OMG, not another boring voice controller interface for a computer” but after i spend a minute seeing the video i changed my mind…and i know what my next project will be! amazing!

Report comment

Reply
rojaro says:

February 14, 2009 at 1:08 pm

Terminator v0.1 ?

Report comment

Reply
natrix says:

February 14, 2009 at 1:08 pm

it’s shodan from system shock 2!

Report comment

Reply
macegr says:

February 14, 2009 at 1:21 pm

I did this back in college, the MS Speech API is pretty easy to use even for a programming-inept electrical engineer. I did *not* use a creepy robot face, but I did duplicate the star trek computer interaction. You can find zip files with all sorts of Majel Barrett soundclips and computer confirmation bleeps and bloops. So my computer would say things like “Incoming transmission” on email, or I’d say “Computer…” “bleeepbloop” “Report current weather” “Temperature is 58 degrees, partly cloudy, wind 7MPH north.” Fun times…

Report comment

Reply
EGO Technology says:

February 14, 2009 at 1:40 pm

That was inspirational and educational. Thank you, young men.

I really appreciated the way you detailed your thought processes all the way through; your ups and your downs.

Way to make use of scavenged materials, also. Mother Earth thanks you.

Report comment

Reply
EdZ says:

February 14, 2009 at 1:54 pm

Looks like SAPI has come a long way from when you could use it to hamfistedly control WMP. I wonder if you can interface with other speech synthesis packages (and if there’s am API for the Voacaloid software).

Report comment

Reply
Rivetgeek says:

February 14, 2009 at 2:13 pm

Its a shame his zip file containing the source code is corrupt.

Report comment

Reply
Tom says:

February 14, 2009 at 3:23 pm

I just though of something awesome. Imagine you had a thin stretchy material in a section of a wall and when you summoned the robot, it’s face pushed forward from behind the stretchy material to make it look like your wall had a face. I’m so going to do this…

Report comment

Reply
brian says:

February 14, 2009 at 4:01 pm

Thank you, everyone, for all the positive words and encouragement!

About the creepy part: I’m glad to hear it :) I was going for a scary mad-scientist feel, and it sounds like I pulled it off :)

@dan: I’m considering adding the head turning with the fan following stuff, pending some experience with opencv and a good turntable mechanism.

@möbius: The only application that really needs a faster response is the main room lights. For those I have a command that’s always enabled – “Stephanie, lights”. So when entering or leaving the room there’s no need to wait for a response (and it’s silent). I’ve never really been bothered by it when using other commands.

@rivetgeek: Sorry about that! Looks like it only uploaded part way. I reuploaded it and tested it out; it should work fine now; thanks!

Also, I saw a lot of requests for more info later, so I added an RSS feed link at the bottom of the page for anyone who wants updates as they come.

Thank you again for all the comments!

Report comment

Reply
1. Troy says:
  
  September 28, 2012 at 6:27 am
  
  Brian have you done any more current things with SAPI?
  
  Report comment
  
  Reply
marz says:

February 14, 2009 at 5:16 pm

This is quite awesome.. but I would actually ditch the face and just wall-mount the whole thing (not only because the mouth movements is a complete waste of power and processing, but also because you could get better audio and have it in a position that could be seen from all areas in the room).

Report comment

Reply
Anonymouser says:

February 14, 2009 at 5:30 pm

That thing give me the creeps

Report comment

Reply
strider_mt2k says:

February 14, 2009 at 6:34 pm

if I may speak freely…

Dude holy shit that is really really cool.
When the mouth gets going it’s pretty good, and the room controls are about perfect from what I saw.

wow really nice.

Report comment

Reply
joe says:

February 15, 2009 at 12:28 am

LOL, Windows.. lame.

Report comment

Reply
edcer says:

February 15, 2009 at 2:13 am

@tom: that stretchy wall idea is a really good one!
A plain white wall would do for a modern house with minimalist decoration, but I’m thinking of a portrait painting, hanging on the wall. Then when you activate the computer, the face comes out and pushes behind the fabric, matching the face of the portrait. Instant +5 creepiness.
Add a few IR motion sensors – it would make a fun burglar alarm…

Report comment

Reply
Christopher Reitmann says:

February 15, 2009 at 4:02 am

I love this so much.

Report comment

Reply
strider_mt2k says:

February 15, 2009 at 5:40 am

lol, joe, lame.

Report comment

Reply
cynic says:

February 15, 2009 at 5:58 am

@joe
lol scriptkiddies.. lame.

@brian
Creepy as hell, love it. Nice job on actually knowing how to switch mains voltage safely and correcting for the problems of the shift-register ‘talking’ to the lights. Many others wouldn’t have bothered.

Report comment

Reply
Pat says:

February 15, 2009 at 6:47 am

she sure hasnt got that CD tray for nothing!!!……

dude: stephanie…..
stephanie: yes?
dude: suck my DICK –
euuuh. I mean eject tray, retract, EJECT and retractttttt.

Report comment

Reply
bono says:

February 15, 2009 at 7:04 am

bob: this guy is not only not single, he is kind of in demand.

Report comment

Reply
d35i9n says:

February 16, 2009 at 7:01 am

“stephanie”
“yes”
“open mouth”

Report comment

Reply
spindizy says:

February 16, 2009 at 11:52 pm

typical single geek. build a robot, and instantly everyone wants to fuck it. lol.

Report comment

Reply
jukus says:

February 17, 2009 at 1:50 am

Sick Project man,inspiring. Looks like we’ve exceeded your bandwidth now too hehe.

Report comment

Reply
luke says:

February 17, 2009 at 1:58 am

Impovment: have a mic input for more than one location(room) and encode+/demux the input so she can understand where in request came from. so instead of specifically naming a light location. the request ‘lights’ would simply switch for that room

Report comment

Reply
JeeCee says:

February 17, 2009 at 2:43 am

Does somebody know where to download this Microsoft SAPI? It seems that all the links are dead.

Report comment

Reply
Will says:

February 17, 2009 at 12:03 pm

Hey brian, why not move your development to a sourceforge or googlecode? Bandwidth quotas from free hosting can be sure a pain

Report comment

Reply
jherazob says:

February 17, 2009 at 6:03 pm

@eldorel
Coincidentially i’ve been looking these days for a good linux voice and Cepstral is the best of what i’ve found for now. Of course, it’s about $30 per voice, but hopefully you’re only gonna need one.
http://cepstral.com/

I for one will tweak the hell out of espeak and it’s voices and learn to live with the results :)

Now, speech *recognition* on linux? None. None at all. If you speak japanese there’s julius and it’s 20000+ word vocabulary database, but otherwise you’re pretty much dead in the water. The software is there, apparently both sphinx and julius are good enough for apps like this and even dictation, but the language models, that which tells the software how to understand your particular language, are nonexistent. There’s an effort at voxforge.org to accumulate enough voice samples from users to be able to construct models for many languages, but since there’s an estimate of 2000 hours needed at minimum for full dictation capabilities things are not looking very good.

Report comment

Reply
kitsana_d says:

February 17, 2009 at 6:12 pm

Brian – Speaking as your sis-in-law and someone who has watched Stephanie “grow up,” I have to ask. When do we get one?

Great presentation, and glad to see how far she’s come!

Report comment

Reply
Lisa says:

February 17, 2009 at 7:09 pm

Shaggy,

This is truly amazing. The boys really enjoyed it and Jarrett wants one. He thinks you can just sit down in an hour and show him how to do it. lol Thanks for sharing your wonderful knowledge. Lisa

Report comment

Reply
brian says:

February 17, 2009 at 9:36 pm

thanks again for all the comments and suggestions :)
@tom and edcer
that would be an amazing effect :) my roommate (also named tom, incidentally) is working on his design currently. he modeled a face using clay on a plastic skull and made a plaster mold. he’s going to use that make a silicone face that will be mounted on another plastic skull, and it will have muscle wire (nitinol) connecting at all the places where muscles connect in our faces. that way he can pull the syllables from sapi and position the mouth to match it (or to make expressions!)

@jukus
yep, it killed the bandwidth – but that’s what the site was there for :) i’ve since migrated everything to a new page, and caleb even changed the link in the post for me! so the source and all should be available again

@luke
right now stephanie’s in only one room, but thanks for the idea – when she expands, i’ll defeinitely keep that in mind :)

@jeecee
microsoft apparently is pushing sapi 5.3 which is built into vista. in the interests of pushing vista, it think, they stopped hosting the 5.1 install. check the comments on the stephanie page on my website if you have troubles finding it

@will
mostly because i didn’t occur to me… i’m mostly watched the hacking scene on the net from afar; this is my first foray into trying to become a real part of the community. any tips on which to use, or best practices? thanks for the suggestion!

@kitsana_d
thanks :d how soon can i convince you that i won’t put in secret backdoors? o.o

@lisa
thanks for checking it out! and i’m always up for encouraging science & tech for a hobby :) maybe over the summer i can set her up at home and he can get a closer look?

Report comment

Reply
MR_PROGRAMMER says:

February 18, 2009 at 12:25 pm

this video reminds me of the game portal. both machines have similar voice. its as if, when you disobey her she will trap you in the room and kill you. cut all your connections so you cant make a 9-1-1 call to get help.

Report comment

Reply