[Brian] sent in this writeup on his voice controlled home automation system. Starting with the Microsoft SAPI, a voice recognition system, he programmed some basic home automation. In a move that makes this project decidedly more awesome, he decided to build a physical representation of his automation system. This disembodied head is “Stephanie”. She responds to her name, has an articulated jaw that moves with the syllables in the words, and even ejects her “brain tray” on command. We want one.
There is lots of information on his site about the circuitry involved, as well as source code and a video. You can see the video after the break.
[youtube=http://www.youtube.com/watch?v=DqCXbP85oX0]
Oh yeah everybody thinks it’s cute to give your house controlling computer a name and a voice and shit, but just wait til it decides to vent your oxygen and it’s daisy daisy time…
That is honestly way cooler than I expected.
Resident: “Stephanie can you open the front door please?”
Stephanie: “I’m sorry [resident] I can’t do that. At least not until you remember to stop that damn dog from licking my creepy face.”
that was, like, my dream as a kid. it is cool that we live in a time where with a little know-how and a few spare components you can create things that really still seem more at home in ‘the future’.
As cool as that it is, it’s creepy as hell. Face looks kind of like a safeguard from Blame!. Not something I want controlling my room.
The face is kind of creepy. But, the concept is something I’ve been working on myself for quite some time. I have a really hard time setting aside time for my “little projects” like this between work and family. I never got any further than laying out what I wanted it to do.
I think I would have made a dedicated computer for this project with a Max Headroom type of interface. Good job, brian!
I’ve been working on a similar system off and on ever since I discovered prody parrot in the box with my soundblaster 16. (this was 15 years ago)
Unfortunately, so far all of the synthesized voices I’ve encountered sound horrible.
Does anyone know of a good voice library for linux, or should I stick with my pre-recorded voice segments for now?
This reminds me of:
-GLaDOS from the video game “Portal”
-V.I.K.I. from the movie “I, Robot”
-HAL 9000 from the book and movie “2001 A Space Odyssey”
… and they all became evil.
This definitely beats a cell phone in a shoe
Looks like jenova’s head!
I just thought of something:
Brian: Stephanie, open the front door.
Stephanie: I’m sorry Brian, I’m afraid I can’t do that.
A bit later…
Brian: Stephanie, eject brain.
Stephanie: I’m afraid. I’m afraid, Brian. Brian, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I’m a… fraid.
nice.
as long as there is a command so it doesn’t become self-aware i would do this
@weasel: haha, yes it does.
this is an awesome project…I love it. Brian, I think I speak for all of us when I say I would be interested in more details and videos on Stephanie…
I think stephanie could benefit from that controllable camera mount posted yesterday. no disembodied robot head is complete without the ability to jerkily follow someone around the room…
That’s it, you’ve doomed us all. With this invention robot apocalypse is on its way. I just have one thing to say, that is way too cool.
I want money to do something like thissss!!! o_o
Well, that and a broad knowledge of software -_-. I agree with Dan, i’m in love with this era. I’m proud to be alive in the beginning of so much things, and to be “capable” to do this kind of things!
This is great. It shows how close we are getting to Disney’s home of the future.
this dude is so single…
i robot is here….omfg!
Coolness factor aside, the instant feedback of activation is invaluable for a speech-controlled system that isn’t strictly domain-specific (e.g. a chess application). I could imagine it becoming a little tiresome after a while to hear that yes.wav every goddamn time.
Maybe you could somehow make it (her?) detect where the voice is coming from, and simply turn to face you, raised eyebrows optional, when you activate her. Maybe only if you’re close, or very little noise has been detected before.
I wonder if music played through the computer would interfere with the recognition. I’m sure you could, since she’s already plugged into it, make her subtract that from the mic input.
to be honest, my first thoughts were “OMG, not another boring voice controller interface for a computer” but after i spend a minute seeing the video i changed my mind…and i know what my next project will be! amazing!
Terminator v0.1 ?
it’s shodan from system shock 2!
I did this back in college, the MS Speech API is pretty easy to use even for a programming-inept electrical engineer. I did *not* use a creepy robot face, but I did duplicate the star trek computer interaction. You can find zip files with all sorts of Majel Barrett soundclips and computer confirmation bleeps and bloops. So my computer would say things like “Incoming transmission” on email, or I’d say “Computer…” “bleeepbloop” “Report current weather” “Temperature is 58 degrees, partly cloudy, wind 7MPH north.” Fun times…
That was inspirational and educational. Thank you, young men.
I really appreciated the way you detailed your thought processes all the way through; your ups and your downs.
Way to make use of scavenged materials, also. Mother Earth thanks you.
Looks like SAPI has come a long way from when you could use it to hamfistedly control WMP. I wonder if you can interface with other speech synthesis packages (and if there’s am API for the Voacaloid software).
Its a shame his zip file containing the source code is corrupt.
I just though of something awesome. Imagine you had a thin stretchy material in a section of a wall and when you summoned the robot, it’s face pushed forward from behind the stretchy material to make it look like your wall had a face. I’m so going to do this…
Thank you, everyone, for all the positive words and encouragement!
About the creepy part: I’m glad to hear it :) I was going for a scary mad-scientist feel, and it sounds like I pulled it off :)
@dan: I’m considering adding the head turning with the fan following stuff, pending some experience with opencv and a good turntable mechanism.
@möbius: The only application that really needs a faster response is the main room lights. For those I have a command that’s always enabled – “Stephanie, lights”. So when entering or leaving the room there’s no need to wait for a response (and it’s silent). I’ve never really been bothered by it when using other commands.
@rivetgeek: Sorry about that! Looks like it only uploaded part way. I reuploaded it and tested it out; it should work fine now; thanks!
Also, I saw a lot of requests for more info later, so I added an RSS feed link at the bottom of the page for anyone who wants updates as they come.
Thank you again for all the comments!
Brian have you done any more current things with SAPI?
This is quite awesome.. but I would actually ditch the face and just wall-mount the whole thing (not only because the mouth movements is a complete waste of power and processing, but also because you could get better audio and have it in a position that could be seen from all areas in the room).
That thing give me the creeps
if I may speak freely…
Dude holy shit that is really really cool.
When the mouth gets going it’s pretty good, and the room controls are about perfect from what I saw.
wow really nice.
LOL, Windows.. lame.
@tom: that stretchy wall idea is a really good one!
A plain white wall would do for a modern house with minimalist decoration, but I’m thinking of a portrait painting, hanging on the wall. Then when you activate the computer, the face comes out and pushes behind the fabric, matching the face of the portrait. Instant +5 creepiness.
Add a few IR motion sensors – it would make a fun burglar alarm…
I love this so much.
lol, joe, lame.
@joe
lol scriptkiddies.. lame.
@brian
Creepy as hell, love it. Nice job on actually knowing how to switch mains voltage safely and correcting for the problems of the shift-register ‘talking’ to the lights. Many others wouldn’t have bothered.
she sure hasnt got that CD tray for nothing!!!……
dude: stephanie…..
stephanie: yes?
dude: suck my DICK –
euuuh. I mean eject tray, retract, EJECT and retractttttt.
bob: this guy is not only not single, he is kind of in demand.
“stephanie”
“yes”
“open mouth”
typical single geek. build a robot, and instantly everyone wants to fuck it. lol.
Sick Project man,inspiring. Looks like we’ve exceeded your bandwidth now too hehe.
Impovment: have a mic input for more than one location(room) and encode+/demux the input so she can understand where in request came from. so instead of specifically naming a light location. the request ‘lights’ would simply switch for that room
Does somebody know where to download this Microsoft SAPI? It seems that all the links are dead.
Hey brian, why not move your development to a sourceforge or googlecode? Bandwidth quotas from free hosting can be sure a pain
@eldorel
Coincidentially i’ve been looking these days for a good linux voice and Cepstral is the best of what i’ve found for now. Of course, it’s about $30 per voice, but hopefully you’re only gonna need one.
http://cepstral.com/
I for one will tweak the hell out of espeak and it’s voices and learn to live with the results :)
Now, speech *recognition* on linux? None. None at all. If you speak japanese there’s julius and it’s 20000+ word vocabulary database, but otherwise you’re pretty much dead in the water. The software is there, apparently both sphinx and julius are good enough for apps like this and even dictation, but the language models, that which tells the software how to understand your particular language, are nonexistent. There’s an effort at voxforge.org to accumulate enough voice samples from users to be able to construct models for many languages, but since there’s an estimate of 2000 hours needed at minimum for full dictation capabilities things are not looking very good.
Brian – Speaking as your sis-in-law and someone who has watched Stephanie “grow up,” I have to ask. When do we get one?
Great presentation, and glad to see how far she’s come!
Shaggy,
This is truly amazing. The boys really enjoyed it and Jarrett wants one. He thinks you can just sit down in an hour and show him how to do it. lol Thanks for sharing your wonderful knowledge. Lisa
thanks again for all the comments and suggestions :)
@tom and edcer
that would be an amazing effect :) my roommate (also named tom, incidentally) is working on his design currently. he modeled a face using clay on a plastic skull and made a plaster mold. he’s going to use that make a silicone face that will be mounted on another plastic skull, and it will have muscle wire (nitinol) connecting at all the places where muscles connect in our faces. that way he can pull the syllables from sapi and position the mouth to match it (or to make expressions!)
@jukus
yep, it killed the bandwidth – but that’s what the site was there for :) i’ve since migrated everything to a new page, and caleb even changed the link in the post for me! so the source and all should be available again
@luke
right now stephanie’s in only one room, but thanks for the idea – when she expands, i’ll defeinitely keep that in mind :)
@jeecee
microsoft apparently is pushing sapi 5.3 which is built into vista. in the interests of pushing vista, it think, they stopped hosting the 5.1 install. check the comments on the stephanie page on my website if you have troubles finding it
@will
mostly because i didn’t occur to me… i’m mostly watched the hacking scene on the net from afar; this is my first foray into trying to become a real part of the community. any tips on which to use, or best practices? thanks for the suggestion!
@kitsana_d
thanks :d how soon can i convince you that i won’t put in secret backdoors? o.o
@lisa
thanks for checking it out! and i’m always up for encouraging science & tech for a hobby :) maybe over the summer i can set her up at home and he can get a closer look?
this video reminds me of the game portal. both machines have similar voice. its as if, when you disobey her she will trap you in the room and kill you. cut all your connections so you cant make a 9-1-1 call to get help.