Jasper is an open-source platform for developing always-on voice-controlled applications — you talk and your electronics listen! It’s designed to run on a Raspberry Pi. [Zach] has been playing around with it and wasn’t satisfied with Jasper’s built-in speech-to-text recognition system. He decided to take the advice of the Jasper development team and modify the system to use AT&T’s speech-to-text engine.
The built-in system works, but it has limitations. Mainly, you have to specify exactly which keywords you want Jasper to look out for. This can be problematic if you aren’t sure what the user is going to say. It can also cause problems when there are many possibilities of what the user might say. For example if the user is going to say a number between one and one hundred, you don’t want to have to type out all one hundred numbers into the voice recognition system in order to make it work.
The Jasper FAQ does recommend using the AT&T’s speech-to-text engine in this situation but this has its own downsides. You are limited to only one request per second and it’s also slower to recognize the speech. [Zach] was just fine with these restrictions but he couldn’t find much information online about how to modify Jasper to make the AT&T engine work. Now that he’s gotten it functional, he shared his work to make it easier for others.
The modification first requires that you have at AT&T developer account. Once that’s setup, you need to make some changes to Jasper’s mic.py module. That’s the only part of Jasper’s core that must be changed, and it’s only a few lines of code. Outside of that, there are a couple of other Python scripts that need to be added. We won’t go into the finer details here since [Zach] goes into great detail on his own page, including the complete scripts. If you are interested in using the AT&T module with your Jasper installation, be sure to check out [Zach’s] work. He will likely save you a lot of time.
next step is using this:
http://honnibal.wordpress.com/2013/12/18/a-simple-fast-algorithm-for-natural-language-dependency-parsing/
to expand capabilities even further, this python syntactic parser should be able to help with figuring out meaning of whole sentences without hardcoding everything.
+1… It just works better!
It’d be another fun thing to try, I simply went with the Speech API because it’s backed by AT&T
+1 ~ Speechless what more is there to say ;^)
I tried Jasper and… it took me hours to get up and running. The guide is not up to date and some packages are different. When I finally had it up and running the recognition was exceptionally poor :( Experiences may vary though!
I feel like there is a way to use Google voice recognition using the site interface to get the live word display from the search bar online. You could also try to port something from the android app… anyone know how feasible this is?
https://www.youtube.com/watch?v=FtWM7M-rfus
This is a slightly different app as it’s main purpose is to translate speech from one language to another but it relies on Google speech to text and Microsoft text to speech API’s. Getting this to work with Google’s API is just waaaaay to easy
I keep receiving a 500 error from this Speech to text API :
http://stackoverflow.com/questions/24159867/500-error-with-att-att-speech-to-text-api-python
The information returned by the API is very vague as to the problem,
if anyone knows the answer – Thanks! :)
Fixed by changing:
r = requests.post(‘https://api.att.com/oauth/token’
to
r = requests.post(‘https://api.att.com/oauth/v4/token’
in STT.py in the jasper/client folder.
I’ve submitted a pull request to include this into the latest build of jasper