Adding Speech Recognition To Your Embedded Platform.

July 11, 2010

[youtube=http://www.youtube.com/watch?v=OEUeJb6Pwt4&feature=player_embedded]

Last week, we posted a story about how to configure speech recognition at a beginner level. Several of the commenters expressed an interest in doing speech recognition for embedded devices. [Nickolay Shmyrev] volunteered to write some directions for those people. In this article, [Nickolay] will be taking you through the basics of setting up your embedded device with CMUsphinx an open source toolkit for speech recognition. He gives programming examples in both C and python. Though we are hosting this, we haven’t set it up and tried it, so please direct any questions you have at [Nickolay] in the comments.

Here we will consider how is it possible to implement speech recognition functions on using
Pocketsphinx library from CMUSphinx project

The advantages of using Pocketsphinx are:

Pocketsphinx is resource-efficient. It can perfectly run on embedded platforms. though it’s not limited to them, you can use pocketsphinx on your desktop/server. Pocketsphinxvhas support for fixed-point only arithmetics so can run without FPU. It is also optimized towards some popular platforms: Blackfin, Maemo, IPhone.
Pocketsphinx supports many languages out-of box. It supports US English, Chinese, French, Russian, German, Dutch and more without need to train anything.
Pocketsphinx is completely free software.
Available bindings for several programming languages are present.

So Pocketsphinx is really the best choice for your speech recognition library.

Before you are going to start with programming speech interfaces there are several things you need to know

Speech recognizers require you to specify the words they will understand (so-called grammar), they will not understand anything else except specified language.
Speech is by nature inaccurate, you need to put this in the corner of speech interface design. Recognizer return you confidence value of the recognized text. Make sure you use this confidence value to reject unreliable results. If recognizer is not confident, try to input the text again, ask for additional information, confirm user intentions.
It’s not the task of the speech recognition library to do sound input. Audio interfaces are often device-specific. You need to record audio in your application and put it in special format – PCM, mono, 8kHz, 16-bit. Doublecheck that. If you have mp3, convert it. If you have audio with 44.1kHz, downsample it.

Let’s start with simple test. Once you installed Pocketsphinx, just run Pocketsphinx_continuous
without any arguments. Wait till

READY…

will appear on terminal then say something. Pocketsphinx will record audio from your microphone and output recognition results.

000000001: hello (-11998485)

You failed to make it recognize hello? Don’t worry, some people find that it’s a hand of fortune who produce the recognition results. Count you are lucky.

Now let’s try to learn how to specify the grammar, the language that Pocketsphinx will recognize.
It’s done using grammar files which are written in JSGF format.

This is rather simple human-readable text format, probably it’s better to start with example:

You see it can specify alternatives, repetitions and skips. Basically JSGF describes
finite state automation for the recognizer. The more restrictive your grammar is, the
better will be recognition accuracy. But don’t forget to include all those fillers
and false starts in real grammar. User will not say to the device

“Pizza with pepperoni”

They will say instead

“I want, let me think… three pizzas with pepperoni no… with onions”

And your grammar should cover that. Once you’ve created your grammar, store it as
grammar.jsgf. Also, record audio file at 8khz mono and name it “myrecording.wav”.

Now, let’s do some of the programming. To demonstrate how speech recognition application is created, let’s first try to use Pocketsphinx with Python. Python API is really simple, example is just six lines of code. To recognize speech you need to accomplish 3 steps and here they are:

#!/usr/bin/python

#Step 1, Initialization
import pocketsphinx as ps
decoder = ps.Decoder(jsgf=’/path/to/your/jsgf/grammar.jsgf’,samprate=’8000′)
# Step 2, open the audio file.
fh = open(“myrecording.wav”, “rb”)
nsamp = decoder.decode_raw(fh)
# Step 3, get the result
hyp, uttid, score = decoder.get_hyp()
print “Got result %s %d” % (hyp, score)

Now, let’s do the same with C. It’s not really different from python, just more suitable
for your device.

#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
ps_decoder_t *ps;
cmd_ln_t *config;
FILE *fh;
char const *hyp, *uttid;
int16 buf[512];
int rv;
int32 score;

/* Initializing of the configuration */
config = cmd_ln_init(NULL, ps_args(), TRUE,
“-samprate”, “8000”,
“-jsgf”, “test.jsgf”,
NULL);
ps = ps_init(config);

/* Open audio file and start feeding it into the decoder */
fh = fopen(“myrecording.wav”, “rb”);
rv = ps_start_utt(ps, “goforward”);
while (!feof(fh)) {
size_t nsamp;
nsamp = fread(buf, 2, 512, fh);
rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE);
}
rv = ps_end_utt(ps);

/* Get the result and print it */
hyp = ps_get_hyp(ps, &score, &uttid);
if (hyp == NULL)
return 1;
printf(“Recognized: %s with prob %d\n”, hyp, ps_get_prob (ps, NULL));

/* Free the stuff */
fclose(fh);
ps_free(ps);
return 0;
}

On Linux, compile the demo with simple command line:

gcc `pkg-config pocketsphinx –cflags –libs` demo.c -o demo

and run

./demo

If it works, it’s ready to be included into your device. Read more about Pocketsphinx functions
in API guide:

http://cmusphinx.sourceforge.net/api/pocketsphinx/

Once you are done with basic examples, it’s time to build your application using Pocketsphinx.
Free your mind when you design that, don’t just focus on simple commands like “turn on lights”. Modern applications include intelligent logic analysis, continuous dictation support and many more things.
Try to be reasonable, design your interface and grammars, think about user and your speech
application will be successful.

Still don’t believe it will work? Check this video demonstrating pocketsphinx running on Nokia N800(at the top of the post). For more details on Pocketsphinx, CMUSphinx project, speech recognition visit http://cmusphinx.sourceforge.net

Adding speech recognition feature to your device

Here we will consider how is it possible to implement speech recognition functions on using
Pocketsphinx library from CMUSphinx project (http://cmusphinx.sourceforge.net)

The advantages of using Pocketsphinx are:

* Pocketsphinx is resource-efficient. It can perfectly run on embedded platforms
though it’s not limited to them, you can use pocketsphinx on your desktop/server. Pocketsphinx
has support for fixed-point only arithmetics so can run without FPU. It is also optimized
towards some popular platforms: Blackfin, Maemo, IPhone.
* Pocketsphinx supports many languages out-of box. It supports US English, Chinese, French, Russian, German, Dutch and more without need to train anything.
* Pocketsphinx is completely free software.
* Available bindings for several programming languages are present.

So Pocketsphinx is really the best choice for your speech recognition library.

Before you are going to start with programming speech interfaces there are several things you need to know

* Speech recognizers require you to specify the words they will understand (so-called grammar), they
will not understand anything else except specified language.
* Speech is by nature inaccurate, you need to put this in the corner of speech interface design.
Recognizer return you confidence value of the recognized text. Make sure you use this confidence
value to reject unreliable results. If recognizer is not confident, try to input the text again,
ask for additional information, confirm user intentions.
* It’s not the task of the speech recognition library to do sound input. Audio interfaces
are often device-specific. You need to record audio in your application and put it in special
format – PCM, mono, 8kHz, 16-bit. Doublecheck that. If you have mp3, convert it. If you have
audio with 44.1kHz, downsample it.

Let’s start with simple test. Once you installed Pocketsphinx, just run Pocketsphinx_continuous
without any arguments. Wait till

READY…

will appear on terminal then say something. Pocketsphinx will record audio from your microphone and output recognition results.

000000001: hello (-11998485)

You failed to make it recognize hello? Don’t worry, some people find that it’s a hand of fortune who produce the recognition results. Count you are lucky.

Now let’s try to learn how to specify the grammar, the language that Pocketsphinx will recognize.
It’s done using grammar files which are written in JSGF format.

http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/

This is rather simple human-readable text format, probably it’s better to start with example:

“Pizza with pepperoni”

They will say instead

“I want let me think… three pizzas with pepperoni no… with onions”

And your grammar should cover that. Once you’ve created your grammar, store it as
grammar.jsgf. Also, record audio file at 16khz mono and name it “myrecording.wav”.

#!/usr/bin/python

Now, let’s do the same with C. It’s not really different from python, just more suitable
for your device.

#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
ps_decoder_t *ps;
cmd_ln_t *config;
FILE *fh;
char const *hyp, *uttid;
int16 buf[512];
int rv;
int32 score;

/* Initializing of the configuration */
config = cmd_ln_init(NULL, ps_args(), TRUE,
“-samprate”, “8000”,
“-jsgf”, “test.jsgf”,
NULL);
ps = ps_init(config);

/* Get the result and print it */
hyp = ps_get_hyp(ps, &score, &uttid);
if (hyp == NULL)
return 1;
printf(“Recognized: %s with prob %d\n”, hyp, ps_get_prob (ps, NULL));

/* Free the stuff */
fclose(fh);
ps_free(ps);
return 0;
}

On Linux, compile the demo with simple command line:

gcc `pkg-config pocketsphinx –cflags –libs` demo.c -o demo

and run

./demo

If it works, it’s ready to be included into your device. Read more about Pocketsphinx functions
in API guide:

http://cmusphinx.sourceforge.net/api/pocketsphinx/

Still don’t believe it will work? Check this video http://www.youtube.com/watch?v=OEUeJb6Pwt4
demonstrating pocketsphinx running on Nokia N800. For more details on Pocketsphinx, CMUSphinx project, speech recognition visit http://cmusphinx.sourceforge.net

26 thoughts on “Adding Speech Recognition To Your Embedded Platform.”

nebulous says:

July 11, 2010 at 6:28 am

The word in the title should be ‘speech’. Just thought I’d mention it. Looks like good info, will read later (after Holland wins the cup)

Report comment

Reply
mostlymac says:

July 11, 2010 at 6:30 am

Hate to be that guy… but speech is misspelled in the article title.

As for the article itself, I’m stunned. That’s an amazing piece of software they’ve got going. I’d love to see somebody develop a third-party app for the iPhone that doesn’t “play songs by Beck” when I’m trying to “dial home”.

I remember when Dragon NaturallySpeaking came out for Windows 95 ages ago. It had terrible accuracy, but with clear articulation and some training (on both ends), it would spit out a decent output. It’s amazing to see how far technology has improved. Now I’m just waiting to step on an elevator and say “Ten Forward”…

Report comment

Reply
Hackaaaaaaaaaaaa says:

July 11, 2010 at 7:33 am

mostlymac .. that requires a very general grammar and is very hard to train.

It is best if you train on a small grammar likes letters, numbers, and directions.

Report comment

Reply
nave.notnilc says:

July 11, 2010 at 9:13 am

nice post, sphinx is some neat stuff; now I just need to find something to do with it :/

Report comment

Reply
turn.self.off says:

July 11, 2010 at 10:29 am

nice to see the nokia N800 still getting some screen time :)

Report comment

Reply
Mattj says:

July 11, 2010 at 10:53 am

Yeah, it was ahead of it’s time.

Report comment

Reply
normaldotcom says:

July 11, 2010 at 11:43 am

Pocketsphinx is pretty awesome, I’m working on integrating it with my Asterisk install (maybe with some voice-controlled zork).

Report comment

Reply
nsh says:

July 11, 2010 at 12:46 pm

> I’m working on integrating it with my Asterisk
> install (maybe with some voice-controlled zork).

Hello normaldotcom

For asterisk integration, please check
http://scribblej.com/svn/

Report comment

Reply
Taylor Cox says:

July 11, 2010 at 3:06 pm

So we could write code say in C code and be able to control our windows or linux desktop or laptop by voice?

Report comment

Reply
Casey O'Donnell says:

July 11, 2010 at 10:56 pm

hey i got two n800s except one has a broken screen :( :( :(. they are pretty neat i get a week and a half on battery with ebook reading.

Report comment

Reply
nsh says:

July 12, 2010 at 12:00 am

> So we could write code say in C code and be able to control our windows or linux desktop or laptop by voice?

Absolutely

Report comment

Reply
strider_mt2k says:

July 12, 2010 at 2:49 am

That’s happening pretty fast for that tablet.
Nice.

Report comment

Reply
Gottabethatguy says:

July 12, 2010 at 8:21 am

Would it be possible to use this with one of the more powerful microcontrollers? I only need to be able to recognize at most 10 words and I can easily cut that back to 4 words without losing the intended functionality of what I’m trying to develop.

Report comment

Reply
nsh says:

July 12, 2010 at 8:34 am

Gottabethatguy, what kind of microcontroller are you talking about, what are specifications?

The requirements for HMM-based recognition are still high, but it’s possible to find more lightweight solutions for your case.

Report comment

Reply
Sree Ram says:

August 28, 2010 at 3:22 am

Great ! got me started , but what about decoding for live audio from mic ? any small hint would do :)
thks

Report comment

Reply
Calin says:

December 27, 2010 at 4:36 am

I’m thinking to do this by using coils from defective hard disks headers. can this be possible?

Report comment

Reply
steve says:

March 11, 2012 at 12:03 am

I have a robot and I want to use Pocketsphinx so I can talk to the robot thing like…where is this room and it will tell me where it is or move foward and it should move forward. Right now I have install pockectsphinx.07 and sphinxbase and when I run using ubuntu 10.04LTS: pocketsphinx_continuous -lm 1998.lm -dict .dict 1998.dic it say READY then listening the when I say something like Good morning it write back Goodmorning….But how do I go from here…how do I use pocketsphinx to allow me to just talk and have what I just said be recorded and send to my robot to move…PLEASE HELP w78steve@gmail.com

Report comment

Reply
1. Nikolay Shmyrev says:
  
  March 11, 2012 at 12:23 pm
  
  Hello Steve
  
  The way to connect recognizer library output to an action is a standard task every programmer could solve. I suppose you need to learn how to write programs. I’m sure you could find quite some references on the web. If you learn Python for example you can do it in a minute. For futher questions please use CMUSphinx forums
  
  http://cmusphinx.sourceforge.net/wiki/communicate
  
  Report comment
  
  Reply
leandromattioli says:

June 8, 2012 at 6:16 am

Hi!

I’m trying to run your examples in Python and C, both give me the following error:

ERROR: “acmod.c”, line 88: Must specify -mdef or -hmm

Do you know what’s triggering this problem?

Thanks in advance.

Report comment

Reply
1. as says:
  
  September 21, 2012 at 3:50 pm
  
  have you solve this issue?
  ERROR: “acmod.c”, line 88: Must specify -mdef or -hmm
  
  my command is
  /usr/local/bin/pocketsphinx_continuous -infile “/var/spool/asterisk/voicemail/default/1111/INBOX/msg0007.wav” -hmm /var/lib/asterisk/communicator -samprate 8000 2
  
  Report comment
  
  Reply
hex says:

January 30, 2013 at 4:32 am

Could not get pocket sphinx to even do remotely relevant speech recognition. All the “matching” text was useless gibberish.

Report comment

Reply
Diego09310 says:

February 8, 2014 at 9:43 am

I installed sphinxbase and pocketsphinx doing ./configure, make and sudo make install. When I run pocketsphinx_continuous it works, but when I try to compile the example, I get: “demo.c:1:26: fatal error: pocketsphinx.h: No such file or directory
compilation terminated.”
How can I tell gcc where is pocketsphinx?

Thank you!

Report comment

Reply
1. Diego09310 says:
  
  February 9, 2014 at 5:16 am
  
  I solved the problem by adding the paths to the .h files:
  gcc `pkg-config pocketsphinx –cflags –libs` -I/home/pi/Instalaciones/voice-recognition/pocketsphinx-0.8/include -I/home/pi/Instalaciones/voice-recognition/sphinxbase-0.8/include/ demo.c -o demo.o
  Now I get a stranger output:
  /tmp/ccXiWv5C.o: In function `main’:
  demo.c:(.text+0x18): undefined reference to `ps_args’
  demo.c:(.text+0x50): undefined reference to `cmd_ln_init’
  demo.c:(.text+0x5c): undefined reference to `ps_init’
  demo.c:(.text+0x84): undefined reference to `ps_start_utt’
  demo.c:(.text+0xd8): undefined reference to `ps_process_raw’
  demo.c:(.text+0xf8): undefined reference to `ps_end_utt’
  demo.c:(.text+0x118): undefined reference to `ps_get_hyp’
  demo.c:(.text+0x140): undefined reference to `ps_get_prob’
  demo.c:(.text+0x164): undefined reference to `ps_free’
  collect2: ld returned 1 exit status
  
  Does anybody how to solve this?
  
  Thanks!
  
  Report comment
  
  Reply
Diego09310 says:

February 11, 2014 at 10:46 am

Solved! Just in case somebody gets this error:
When I copied the comand to compile, the double dash (–) was replaced by a longer dash (em dash? —).

Report comment

Reply
1. fito_segrera says:
  
  May 16, 2014 at 10:18 pm
  
  Hi Diego09310, I’m having the same problem you had:
  
  demo.c:(.text+0×18): undefined reference to `ps_args’
  demo.c:(.text+0×50): undefined reference to `cmd_ln_init’
  demo.c:(.text+0x5c): undefined reference to `ps_init’
  demo.c:(.text+0×84): undefined reference to `ps_start_utt’
  demo.c:(.text+0xd8): undefined reference to `ps_process_raw’
  demo.c:(.text+0xf8): undefined reference to `ps_end_utt’
  demo.c:(.text+0×118): undefined reference to `ps_get_hyp’
  demo.c:(.text+0×140): undefined reference to `ps_get_prob’
  demo.c:(.text+0×164): undefined reference to `ps_free’
  
  What do you mean by “When I copied the comand to compile, the double dash (–) was replaced by a longer dash (em dash? —).”??
  
  Report comment
  
  Reply
  1. Diego09310 says:
    
    May 17, 2014 at 1:51 am
    
    Hi fito_segrera, I think I didn’t explain well (by reading my comment again).
    
    In the command “gcc `pkg-config pocketsphinx –cflags –libs` demo.c -o demo” you can see that the dash before cflags and libs is longer than the dash between pkg and config or -o. This is because it’s meant to be two dashes “- -” (I introduced an space between them so they appear as two dashes in the comment).
    
    If you don’t understand me (I’m not being as clear as I’d like to), I suggest you look at the pkg-config example in the wikipedia: http://en.wikipedia.org/wiki/Pkg-config
    
    Report comment
    
    Reply

Hackaday

Adding Speech Recognition To Your Embedded Platform.

26 thoughts on “Adding Speech Recognition To Your Embedded Platform.”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

MXM: Powerful, Misused, Hackable

VCF East 2024 Was Bigger And Better Than Ever

Microsoft Killed My Favorite Keyboard, And I’m Mad About It

Remembering Peter Higgs And The Gravity Of His Contributions To Physics

Chandra X-ray Observatory Threatened By Budget Cuts

Our Columns

Hackaday Podcast Episode 267: Metal Casting, Plasma Cutting, And A Spicy 555

This Week In Security: Putty Keys, Libarchive, And Palo Alto

Human-Interfacing Devices: HID Over I2C

Fail Of The Week: Can An Ultrasonic Cleaner Remove Bubbles From Resin?

Linux Fu: Stupid Systemd Tricks

26 thoughts on “Adding Speech Recognition To Your Embedded Platform.”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns