Let Alexa Control Your Life; Guide to Voice-Enable Everything

Let’s face it, automation doesn’t feel quite as futuristic unless you can just say what you want out loud and have the machines flawlessly obey. That is totally possible now — and on the cheap. Well, cheap as far as money goes. It can be an expensive learning curve to get it all working. This will help. [Lindo St. Angel] has put together a guide to navigate voice control of hardware using Amazon’s Alexa SDK.

We previously reported that Amazon’s AI had escaped its hardware prison in the form of the Alexa Skills Kit. Yes, calling it the Alexa SDK above is wrong it’s actually the ASK but nobody knows what that acronym is while most recognize the gist of an SDK. It gives you the hooks and the documentation necessary to leverage the functionality in your own applications. The core functionality of Alexa is voice recognition. Even so, it’s still a tall hill to climb.

[Lindo] has broken down the problem into a very manageable example. The Amazon Voice Service (part of ASK) is used for voice recognition and control. Amazon’s Lambda service connects the ASK to your piece of hardware; in this case he’s using a Raspberry Pi as the server. The final step is to connect your hardware to the Pi. [Lindo] is interfacing a keypad-based home automation system with the Pi but the sky’s the limit at this point.

With all the authentication and connectivity laid bare, this is a lot more approachable. The question is no longer can you connect everything to voice control. The question becomes should you give control of everything over to one single online service?

20 thoughts on “Let Alexa Control Your Life; Guide to Voice-Enable Everything

  1. Has anyone come across anything stand alone (that doesn’t require an Internet connection) that can still give reasonable reliability when it comes to translating speech into toggling pins but still be able to cope with a fairly large number of different commands and parameters?

    1. I thunk the reason for cloud based voice recognition is the shear amount of data that your voice is crossmatched with (i.e. the whole english database, plus other languages and accents) if you have a fairly strong computer, probably you can implement your own voice recognition, or find a way to download the database.

      1. Unfortunately the datasets are closely gaurded. These days machine learning is easy so most of the competitive advantage comes from have large high quality datasets. Strangely enough nobody wants to share. Perhaps comeone can find a large collection of transcribed text, or a large collection of people reading books?

    2. You will need to do a bit of research to confirm this but, anything running Android Jelly Bean (or above) should be able to use offline voice recognition. So that means the Raspberry Pi 2.

    3. Mac OS X dictation has an option to do offline dictation – you have to download a fairly large dataset, but once you do, you can dictate without any internet access.

      It is most definitely not open, and I’ve never used the speech recognition APIs, so I’m not sure what exactly is feasibly in terms of programmability, but it might be work a look if use use Macs

    4. Here’s a good place to start:
      https://en.wikipedia.org/wiki/Speech_recognition_software_for_Linux

      I tried getting Palaver to work, it’s based on Google’s voice recog. API, but I could never get it working. I was mainly interested in voice dictation for notes, email, etc. Maybe it was a hardware issue but still, it never worked on two different computers I tried it on.
      https://github.com/markmandel/Palaver
      http://www.linux.com/news/embedded-mobile/mobile-linux/711479-palaver-taps-googles-voice-technology-for-linux-speech-recognition/

      I

    5. CMU pocketsphinx can run on a raspi, in fact it’s the offiine speech recognition backend for Android. I wrote a python script ages ago to control music playback and it was pretty good (on my desktop, i didn’t have a pi then). the trick is that it’s context based, so it needs to know what words in its vocabulary go together in order to reduce errors. I had to make a bash script to give it every possible combination of “[wake word] play [song] by [artist]”

  2. it’s encouraging to see the progress of this project go from:
    “use only amazon servers and only on amazon devices to only buy amazon stuff” to
    “use only amazon servers and only on amazon devices to do anything” to
    “use only amazon servers on any devices to do anything”

    almost like a believable plan an AI would cook up to convince humanity to become interested in it and install it on all devices…

    brb calling spielberg

    1. The cheapest voice controller options are, a cheap android phone, one that can be rooted and has wifi + bluetooth so it can talk to all your IOT modules. All that for $50, nothing else comes close in terms of value for money. Look around and you may even get one that fits on your wrist. How can even the smartest hack beat that? It is a classic example of the “economies of scale.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s