Roll Your Own Amazon Echo On A Raspberry Pi

March 31, 2016

Speech recognition coupled with AI is the new hotness. Amazon’s Echo is a pretty compelling device, for a largish chunk of change. But if you’re interested in building something similar yourself, it’s just gotten a lot easier. Amazon has opened up a GitHub with instructions and code that will get you up and running with their Alexa Voice Service in short order.

If you read Hackaday as avidly as we do, you’ve already read that Amazon opened up their SDK (confusingly called a “Skills Kit”) and that folks have started working with it already. This newest development is Amazon’s “official” hello-world demo, for what that’s worth.

There are also open source alternatives, so if you just want to get something up and running without jumping through registration and licensing hoops, you’ve got that option as well.

Whichever way you slice it, there seems to be a real interest in having our machines listen to us. It’s probably time for an in-depth comparison of the various options. If you know of a voice recognition system that runs on something embeddable — a single-board computer or even a microcontroller — and you’d like to see us look into it, post up in the comments. We’ll see what we can do.

Thanks to [vvenesect] for the tip!

40 thoughts on “Roll Your Own Amazon Echo On A Raspberry Pi”

Tiadrin says:

March 31, 2016 at 8:15 am

something like http://www.veear.eu/ ?

Report comment

Reply
Bruno says:

March 31, 2016 at 9:16 am

I prefer this one. https://jasperproject.github.io/

Report comment

Reply
Chris C. says:

March 31, 2016 at 9:31 am

It’s not your own Amazon Echo, or even a half-decent substitute, unless it supports a wake word. Which I haven’t seen any evidence the Alexa Voice Service provides.

I haven’t found ANY decent way to implement a wake word on a DIY project. Folks keep telling me to use general purpose speech recognition engines, like Sphinx. I’m guessing none of these people making this suggestions have ever tried it. The number of false positive/negative detections is ridiculous, even with lots of tweaks and training.

Proper wake word support seems not to use general-purpose speech recognition at all. I know Google uses a set of filters and neural nets, highly trained to recognize one word only. That approach yields excellent results, with minimal CPU and memory usage, and without constantly streaming every sound to the cloud. I suspect the other big players use a similar, if not identical approach.

That’s what we need MOST at this point. Without this piece of the puzzle, DIY speech-enabled projects are never really going to be very useful.

Report comment

Reply
1. Gary says:
  
  March 31, 2016 at 10:14 am
  
  I would also add we need a good far-field microphone solution like the one in the Echo. Otherwise, we’re all stuck within a few feet of a regular mic, carrying around a mic (or cellphone), or wearing some sort of [bluetooth] headset. That’s why I doubt the Amazon Tap will ever take off. The Tap is an Echo with a normal mic and no wake word – the two best selling points of the original.
  
  Report comment
  
  Reply
  1. Jerry says:
    
    March 31, 2016 at 11:06 am
    
    I’m sure all the tin foil hat people (seems like half of hackaday) will appreciate the button…but of course they’ll think it’s all just a ruse and it’s listening all the time and sending the voice data to every three letter agency, plus Amazon, Apple, Microsoft and Google, even in the face of router logs that show otherwise.
    
    I suspect lack of good microphone may be part of the OPs problem…
    
    Report comment
    
    Reply
    1. Chris C. says:
      
      March 31, 2016 at 1:47 pm
      
      It’s not an issue of the microphone quality, but of background noise. I usually have music playing.
      
      I want to speak the wake word/phrase, from anywhere in the room and at a louder volume than the music, and have it recognized. It can then automatically turn down the music if needed, for good recognition of the commands to follow. Older versions of desktop Chrome would detect the “Ok Google” wake word. It recognized it successfully in my scenario, almost without any false positive/negative detections at all. And it did so even with an awful mic. There’s just no easy way to integrate this functionality into a project of your own. (Chrome only listens for the wake word when it’s the active window, with Google in the current tab. I set that up in a VM which I never touched, so that it would always be listening even when I’m using my computer for other stuff. And scanned the screen to determine when Chrome heard the wake word. But that’s horribly inelegant, and wasteful of resources.)
      
      Contrast that to a general-purpose speech recognition engine which assumes if it’s active, then someone must be speaking; and attempts to fit *everything* it hears to speech. Try this. Play some music with no lyrics, and watch it spit out words continuously, even though no speech is present whatsoever. Cough or make your chair creak in a silent room, more words. Give it a dictionary consisting only of the wake word, and it will hear that word in every noise. Give it a large dictionary, and it frequently fails to detect the wake word in the presence of background noise. A better microphone often makes this worse instead of better, by giving the engine more non-speech sound to mistranslate.
      
      Seriously, we just need a good hacker-friendly wake word detector.
      
      Report comment
      
      Reply
      1. Greenaum says:
        
        March 31, 2016 at 5:33 pm
        
        I know it’s not the same thing, but could you whistle, or clap, to wake it up? Easy to detect those with simple circuits.
        
        Report comment
      2. Chris C. says:
        
        March 31, 2016 at 6:15 pm
        
        That’s the second suggestion for “wake on whistle”, first made by Elliot. It’s so retro, I honestly didn’t consider it. But it would be easy to implement. I just wonder how immune it could be made against false positives. What could one easily whistle, that is unique enough that it wouldn’t be found in music?
        
        Maybe the “officer on deck” whistle from old Star Trek?
        
        Report comment
      3. Tore Lund says:
        
        April 1, 2016 at 12:29 am
        
        In the mid 80’s there was a little toy VW van with voice control. It took four commands: forward, right, left, back. It calculated ratios between different audio frequencies in a specific order, made it recognize these different words. It was not speaker specific or needed training of course. I have not read the terms on this forum, so I won’t tell you my favorites, but the fun part were the words you could substitute the commands with and still make it move.
        
        Something similar is doable in software with no need for server voice recognition in the traditional way.
        Here’s a video on the toy controller chip:
        
        Another way is with the Philips IC: http://www.futurlec.com/News/Philips/SpeechChip.html
        Can learn 100 words and understand simple sentences. I remember there being an Arduino shield with one. Lot’s of hits on “Arduino robot voice recognition”
        
        Report comment
      4. Tore Lund says:
        
        April 1, 2016 at 12:30 am
        
        Sorry, the video link: https://www.youtube.com/watch?v=kFth9K_IvwA
        
        Report comment
      5. Tore Lund says:
        
        April 1, 2016 at 12:35 am
        
        Another edit: The ic in the video is more advanced than the one I describe and I think I fist saw this car in the late 70’s. So this kind of frequency lookup state machine is really old tech.
        
        Report comment
      6. AltMarcxs says:
        
        April 2, 2016 at 5:30 am
        
        @Tore Lund: Nowadays we got ARM cortex M4 module with ASR 200 words: http://www.mikroe.com/click/speakup/
        
        Report comment
      7. amenzeleev says:
        
        April 4, 2016 at 2:38 pm
        
        I wonder if a two-stage two-word filter approach would work…
        
        1. A “dirty” wake word detector running on the user side(on the Pi). The detector skews toward false positives (but isn’t /too/ bad). Recognizing the second word activates Alexa and starts streaming.
        2. You then pass the second word to Alexa in the cloud. I’m assuming Alexa’s speech- or command-recognition is better/easier than rolling your own (maybe not…I played with the Echo once or twice) . The cloud’s job at this point is to recognize the second word.
        a. Word is recognized — get an “OK” signal back. The user-side then starts streaming to Alexa.
        b. second word is /not/ recognized — nothing happens ( or rather a “NOT OK, FOO” is sent back)
        
        or you can save the last 0.5 seconds and re-send the first word to the cloud….
        
        Of course this does not solve the sound problems that are addressed by the far-field microphone array / tech…
        
        Report comment
  2. AltMarcxs says:
    
    April 2, 2016 at 5:48 am
    
    Building the hardware, for a far-field microphone would be really easy with the TI chips (the same as in Echo), trouble lies ,in setting up the parameter/firmware for these (it use the proprietary TI Purepath software).
    
    Report comment
    
    Reply
2. Elliot Williams says:
  
  March 31, 2016 at 10:32 am
  
  That makes a ton of sense. The wake word needs to be unambiguously detected and is independent of everything else, so you’d gain a lot by designing a special-purpose detector for it.
  
  Wake word is finding a needle in a haystack — after that it’s just identifying which needle you’ve just been handed.
  
  It must help to pick something totally improbable as your wake word. Makes me think “OK Google” is a horrid choice. “Alexa” or “Cortana” don’t seem all that much better. They must have _strong_ algorithms. I’m gonna call mine “Bandersnatch”.
  
  Or wake on whistle. http://www.limpkin.fr/index.php?post/2013/04/26/The-whistled%3A-how-to-remake-a-dozen-years-old-project-the-right-way
  
  Report comment
  
  Reply
  1. Chris says:
    
    March 31, 2016 at 11:00 am
    
    I’d call it “Centurion” and give it the voice of a Cylon Centurion from the old 1978 series. “By your command” would be the standard reply.
    
    Report comment
    
    Reply
    1. Chris says:
      
      March 31, 2016 at 11:01 am
      
      …the old 1978 Battlestar Galactica series…
      
      Report comment
      
      Reply
    2. Elliot Williams says:
      
      March 31, 2016 at 1:18 pm
      
      That’s brilliant.
      
      Report comment
      
      Reply
      1. Ren says:
        
        March 31, 2016 at 2:01 pm
        
        “So say we all!”
        
        Report comment
      2. Sheff says:
        
        March 31, 2016 at 8:37 pm
        
        one billion +1’s
        and may the Lords of Cobal bless you .
        
        Report comment
  2. Bill says:
    
    March 31, 2016 at 1:16 pm
    
    When Ericsson first introduce voice dialing on their phones around 2000, they suggested Abracadabra as the wake word (magic word was their term for it), but it could be anything.
    
    Report comment
    
    Reply
3. Tripp Williamson says:
  
  March 31, 2016 at 11:20 am
  
  I was thinking a article I read made mention that per terms of usage, the voice service had to be called via a button press.
  
  Report comment
  
  Reply
4. AltMarcxs says:
  
  April 2, 2016 at 6:43 am
  
  I would (or will because, I’ve a parrot toy robot, with limited speak recognition, that needs a Pi Zero or micro Odroid C2) use the Speakup cortex M4 for wake words and limited other actions to save energy.
  
  I tried out, 12 years ago, the Sphinx and other ASR under linux, the best result I got was with the IBM ViaVoice linux SDK used in conjunction with Mr.House (free home automation soft), with a head set it worked great (ex: “Computer Music Louder” with music in background), that all on a single core Intel 1.6GHz used at the same time as desktop.
  Later (8 years ago), I got good results with the Simon-listen project: http://www.simon-listens.org/index.php?id=122&L=1 . It used the Julius ASR: https://en.wikipedia.org/wiki/Julius_(software) and the HTK (ASR): http://htk.eng.cam.ac.uk
  
  The sound input quality IS primordial, without a good far-field microphone AND a bit of automated sound engineering you want get anywhere, and here it’s get tricky with linux & USB Mics,, but reverse engineering Echo far field mics. could be done (see other comment).
  
  Report comment
  
  Reply
5. Artur M. says:
  
  April 4, 2016 at 2:15 pm
  
  What about doing a two-layer, two-word setup?
  
  1. A “dirty” wake word recognizer that runs on your hardware (in my case, Pi) that skews toward false positives (but isn’t /too/ terrible). This starts streaming to Alexa.
  
  2. Alexa then recognizes the second word. I’m gambling that Alexa’s speech recognition is more impressive than rolling your own. I haven’t tried it other than playing with an Echo once or twice…
  a. If this word is /also/ recognized, an “OK ” signal is sent back to device, and you are now directly talking to Alexa.
  b. if the word is not recognized, nothing happens (or rather, a “erm…wut?” is sent back to the device).
  
  …Or you can just use the same word twice, by continuous recording last 0.5 seconds of sound.
  
  Of course, this won’t solve any of the sound quality supposedly addressed by the microphone array / tech in the Echo….
  
  Report comment
  
  Reply
Reactive Light says:

March 31, 2016 at 10:32 am

I suspect Amazon published this to make people think the new Echo Dot ($90) is a good deal in comparison to hacking one together. I just received two Dots today, and they’re (almost) everything I wanted the original Echo to be (USB power so it can run on batteries, and a stereo output jack for connecting to a real audio system).

Report comment

Reply
1. Elliot Williams says:
  
  March 31, 2016 at 10:33 am
  
  :)
  
  Report comment
  
  Reply
2. Gary says:
  
  March 31, 2016 at 10:43 am
  
  Totally agree. In contrast to my comment above about Amazon removing the two best features of the Echo for the Tap, they kept them in the Dot and only removed the part that we can most easily provide ourselves – the speakers. What I really expected when the Dot was released was for the Dot to have an inductive charger, and for Amazon to sell additional pads you could place around your house. Like keep one pad in the kitchen, one on the end table beside your couch, one in the garage, etc. Just pick up the Dot and move it to the nearest pad and forget about it till you need it.
  
  Report comment
  
  Reply
  1. Rodney McKay says:
    
    March 31, 2016 at 5:23 pm
    
    But the Dot doesn’t have batteries (because batteries & always-on speech recognition aren’t a good fit for more than brief use). My comment about running the Dot on batteries meant that a USB battery pack could be used short-term or, better, it can be used in a vehicle without the power inverter that would be needed by the original Echo.
    
    Report comment
    
    Reply
    1. Jerry says:
      
      March 31, 2016 at 10:39 pm
      
      The original echo power supply is 15V – I suspect that the 14.5 volts or so when the car is running would suffice without an inverter.
      
      Report comment
      
      Reply
Patrocles says:

March 31, 2016 at 12:21 pm

Couple of libraries/api’s

https://pypi.python.org/pypi/SpeechRecognition/ (Python)
https://wit.ai/getting-started (free, a whole bunch of language frameworks supported)
http://ability.nyu.edu/p5.js-speech/ (P5, JavaScript)

Report comment

Reply
Koen says:

March 31, 2016 at 12:53 pm

Soon I can tell my smart house what to do instead of pushing buttons on my smart phone. I will however pass on the Microsoft bot libraries.

Report comment

Reply
stephenjosephbishop says:

April 1, 2016 at 4:13 am

Bought an Echo, used it for a few days. Forgot about it keep trying tk get myself to use it but as I don’t care about sports and have a real sound system (old 5.1 pro logic II hooked up to an ouya connecting through USB sound blaster) that’s a no go. If course my biggest issue is connectivity, it really hates my 63-char password for the WiFi, that or it just doesn’t play we with my routers. Can anyone tell me what these things are actually useful for?

Also once I found signs of an SSH server running on it I called, emailed, and Formed Amazon support trying to findout the password/credintals not sulringly to no avail. All I bought the darn thing for was to get root access and use its AMAZING Mic array for a real VI. Sorry amazon but your crappy assistant and I just don’t work well. It’s not the cloud connectivity issue alone but it really makes me annoyed knowing I can’t risk using it as am alarm for fearnit might have another (dreaded) connectivity related failure and I sleep into work time.

Anyway, if anyone k ows of a sub $200 mic array similar to the Echo please let me n ow, the closet I’ve found was $999+. (I’d throw $100 hard earned bucks at the kind fella who figures out an in to our Echo’s, rooted access that is).

Report comment

Reply
1. Mr Stephen Morrell says:
  
  May 12, 2016 at 6:21 am
  
  Add another $1000+ from me! Hearing-impaired people really need access to that mic array for speech-to-text!!! Amazon, please expose the API!!!
  
  These ones aren’t as good:
  $100 for the Samson: http://www.juno.co.uk/products/samson-go-mic-connect-portable-stereo-usb/561561-01//?currency=GBP&flt=1&gclid=Cj0KEQjw5ti3BRD89aDFnb3SxPcBEiQAssnp0k-6DNwBYZwcVsiaAGbtwDtddsEl1rR2g95T9rvDk2YaAuBA8P8HAQ
  
  $400 for accoustic magic
  
  Please please please PM me if you have a solution!!
  
  Thanks
  Stephen Morrell
  
  Report comment
  
  Reply
Johanc says:

April 1, 2016 at 1:19 pm

Computer… Computer….
play Mambo!

Report comment

Reply
1. Sheff says:
  
  April 1, 2016 at 9:43 pm
  
  I’ll just leave this here
  https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwju3838ke_LAhWFax4KHYeKByUQjRwIBw&url=http%3A%2F%2Fyesterdayscheese.blogspot.com%2F2011%2F10%2Fhello-computer-scotty-uses-apple-iphone.html&psig=AFQjCNE-5wG4ew0W726vRMcaPAzDm5v-Ow&ust=1459658549248587
  
  Report comment
  
  Reply
vvenesect says:

April 4, 2016 at 8:38 am

A search for “beam forming microphone array” turns up links like these:

http://research.microsoft.com/en-us/projects/microphone_array/

https://www.researchgate.net/publication/220735082_A_portable_USB-based_microphone_array_device_for_robust_speech_recognition

http://www.ecs.umass.edu/ece/sdp/sdp14/…/Team15FinalMDRReport.pdf

Some of these dig heavily into math, and unfortunately that’s not a strong suit of mine. The data is out there…it just needs to be parsed into an easier-to-digest form for everyone to understand and use.

Report comment

Reply
George says:

June 14, 2016 at 8:00 am

So is this project working without pressing the Start Listening button. I have tried the project on github which was released by Amazon and I had to press this button prior to start the voice recognition. I am thinking to pair the raspberry pi with a bluetooth speaker with integrated microphone and control it from anywhere just with my voice, for example on the balcony or while taking a shower. Not sure how sensitive the built-in microphone will be, but I think it is worth trying.

Report comment

Reply
Derek B (@KrunchMuffin) says:

October 31, 2016 at 12:01 pm

alexapi now has “Alexa” wake command. And I stumbled upon this http://www.cnx-software.com/2016/07/06/99-matrix-creator-raspberry-pi-add-on-board-features-plenty-of-sensors-a-2-4-ghz-radio-and-more/

Report comment

Reply
1. nicolas B says:
  
  November 27, 2016 at 7:27 am
  
  Similar project :https://www.kickstarter.com/projects/seeed/respeaker-an-open-modular-voice-interface-to-hack
  And they sell the mic array separately as a usb mic.
  
  Report comment
  
  Reply
  1. Peter Lane-Collett says:
    
    December 12, 2016 at 5:01 am
    
    Just the respeaker mic array board is $79 without the core board. Might as well buy the Matrix developer board for $99 and get so much more.
    
    Report comment
    
    Reply

Hackaday

Roll Your Own Amazon Echo On A Raspberry Pi

40 thoughts on “Roll Your Own Amazon Echo On A Raspberry Pi”

Leave a Reply to ChrisCancel reply

Search

Never miss a hack

If you missed it

A Brief History Of Fuel Cells

Big Chemistry: Fuel Ethanol

The World Wide Web And The Death Of Graceful Degradation

Remembering The ISP That David Bowie Ran For Eight Years

Falling Down The Land Camera Rabbit Hole

Our Columns

The Need For Speed?

Hackaday Podcast Ep 322: Fake Hackaday Writers, New Retro Computers, And A Web Rant

This Week In Security: Signal DRM, Modern Phone Phreaking, And The Impossible SSH RCE

Hackaday Supercon 2025 Call For Participation: We Want You!

FLOSS Weekly Episode 833: Up And Over

40 thoughts on “Roll Your Own Amazon Echo On A Raspberry Pi”

Leave a Reply to ChrisCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns