How to Upgrade Jasper’s Voice Recognition with AT&T’s Speech-to-Text API

Jarvis upgrade

Jasper is an open-source platform for developing always-on voice-controlled applications — you talk and your electronics listen! It’s designed to run on a Raspberry Pi. [Zach] has been playing around with it and wasn’t satisfied with Jasper’s built-in speech-to-text recognition system. He decided to take the advice of the Jasper development team and modify the system to use AT&T’s speech-to-text engine.

The built-in system works, but it has limitations. Mainly, you have to specify exactly which keywords you want Jasper to look out for. This can be problematic if you aren’t sure what the user is going to say. It can also cause problems when there are many possibilities of what the user might say. For example if the user is going to say a number between one and one hundred, you don’t want to have to type out all one hundred numbers into the voice recognition system in order to make it work.

The Jasper FAQ does recommend using the AT&T’s speech-to-text engine in this situation but this has its own downsides. You are limited to only one request per second and it’s also slower to recognize the speech. [Zach] was just fine with these restrictions but he couldn’t find much information online about how to modify Jasper to make the AT&T engine work. Now that he’s gotten it functional, he shared his work to make it easier for others.

The modification first requires that you have at AT&T developer account. Once that’s setup, you need to make some changes to Jasper’s mic.py module. That’s the only part of Jasper’s core that must be changed, and it’s only a few lines of code. Outside of that, there are a couple of other Python scripts that need to be added. We won’t go into the finer details here since [Zach] goes into great detail on his own page, including the complete scripts. If you are interested in using the AT&T module with your Jasper installation, be sure to check out [Zach's] work. He will likely save you a lot of time.

 

Comments

  1. rasz_pl says:

    next step is using this:
    http://honnibal.wordpress.com/2013/12/18/a-simple-fast-algorithm-for-natural-language-dependency-parsing/
    to expand capabilities even further, this python syntactic parser should be able to help with figuring out meaning of whole sentences without hardcoding everything.

  2. notabena4us says:

    +1 ~ Speechless what more is there to say ;^)

  3. Eirinn says:

    I tried Jasper and… it took me hours to get up and running. The guide is not up to date and some packages are different. When I finally had it up and running the recognition was exceptionally poor :( Experiences may vary though!

  4. Mac Cartier says:

    I feel like there is a way to use Google voice recognition using the site interface to get the live word display from the search bar online. You could also try to port something from the android app… anyone know how feasible this is?

    • Kerimil says:

      This is a slightly different app as it’s main purpose is to translate speech from one language to another but it relies on Google speech to text and Microsoft text to speech API’s. Getting this to work with Google’s API is just waaaaay to easy

  5. Ben says:

    I keep receiving a 500 error from this Speech to text API :

    http://stackoverflow.com/questions/24159867/500-error-with-att-att-speech-to-text-api-python

    The information returned by the API is very vague as to the problem,
    if anyone knows the answer – Thanks! :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 92,407 other followers