Adding Speech Recognition To Your Embedded Platform.

[youtube=http://www.youtube.com/watch?v=OEUeJb6Pwt4&feature=player_embedded]

Last week, we posted a story about how to configure speech recognition at a beginner level. Several of the commenters expressed an interest in doing speech recognition for embedded devices. [Nickolay Shmyrev] volunteered to write some directions for those people. In this article, [Nickolay] will be taking you through the basics of setting up your embedded device with CMUsphinx an open source toolkit for speech recognition. He gives programming examples in both C and python. Though we are hosting this, we haven’t set it up and tried it, so please direct any questions you have at [Nickolay] in the comments.

Here we will consider how is it possible to implement speech recognition functions on using
Pocketsphinx library from CMUSphinx project

The advantages of using Pocketsphinx are:

  • Pocketsphinx is resource-efficient. It can perfectly run on embedded platforms. though it’s not limited to them, you can use pocketsphinx on your desktop/server. Pocketsphinxvhas support for fixed-point only arithmetics so can run without FPU. It is also optimized towards some popular platforms: Blackfin, Maemo, IPhone.
  • Pocketsphinx supports many languages out-of box. It supports US English, Chinese, French, Russian, German, Dutch and more without need to train anything.
  • Pocketsphinx is completely free software.
  • Available bindings for several programming languages are present.

So Pocketsphinx is really the best choice for your speech recognition library.

Before you are going to start with programming speech interfaces there are several things you need to know

  • Speech recognizers require you to specify the words they will understand (so-called grammar), they will not understand anything else except specified language.
  • Speech is by nature inaccurate, you need to put this in the corner of speech interface design. Recognizer return you confidence value of the recognized text. Make sure you use this confidence value to reject unreliable results. If recognizer is not confident, try to input the text again, ask for additional information, confirm user intentions.
  • It’s not the task of the speech recognition library to do sound input. Audio interfaces are often device-specific. You need to record audio in your application and put it in special format – PCM, mono, 8kHz, 16-bit. Doublecheck that. If you have mp3, convert it. If you have audio with 44.1kHz, downsample it.

Let’s start with simple test. Once you installed Pocketsphinx, just run Pocketsphinx_continuous
without any arguments. Wait till

READY…

will appear on terminal then say something. Pocketsphinx will record audio from your microphone and output recognition results.

000000001: hello (-11998485)

You failed to make it recognize hello? Don’t worry, some people find that it’s a hand of fortune who produce the recognition results. Count you are lucky.

Now let’s try to learn how to specify the grammar, the language that Pocketsphinx will recognize.
It’s done using grammar files which are written in JSGF format.

This is rather simple human-readable text format, probably it’s better to start with example:

#JSGF V1.0;
grammar goforward;
public <move> = go <direction> <distance> [meter | meters];
<direction>= forward | backward;
<distance>= (one | two | three | four | five | six | seven | eight | nine | ten | twenty)+;

You see it can specify alternatives, repetitions and skips. Basically JSGF describes
finite state automation for the recognizer. The more restrictive your grammar is, the
better will be recognition accuracy. But don’t forget to include all those fillers
and false starts in real grammar. User will not say to the device

“Pizza with pepperoni”

They will say instead

“I want, let me think… three pizzas with pepperoni no… with onions”

And your grammar should cover that. Once you’ve created your grammar, store it as
grammar.jsgf. Also, record audio file at 8khz mono and name it “myrecording.wav”.

Now, let’s do some of the programming. To demonstrate how speech recognition application is created, let’s first try to use Pocketsphinx with Python. Python API is really simple, example is just six lines of code. To recognize speech you need to accomplish 3 steps and here they are:

#!/usr/bin/python

#Step 1, Initialization
import pocketsphinx as ps
decoder = ps.Decoder(jsgf=’/path/to/your/jsgf/grammar.jsgf’,samprate=’8000′)
# Step 2, open the audio file.
fh = open(“myrecording.wav”, “rb”)
nsamp = decoder.decode_raw(fh)
# Step 3, get the result
hyp, uttid, score = decoder.get_hyp()
print “Got result %s %d” % (hyp, score)

Now, let’s do the same with C. It’s not really different from python, just more suitable
for your device.

#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
ps_decoder_t *ps;
cmd_ln_t *config;
FILE *fh;
char const *hyp, *uttid;
int16 buf[512];
int rv;
int32 score;

/* Initializing of the configuration */
config = cmd_ln_init(NULL, ps_args(), TRUE,
“-samprate”, “8000”,
“-jsgf”, “test.jsgf”,
NULL);
ps = ps_init(config);

/* Open audio file and start feeding it into the decoder */
fh = fopen(“myrecording.wav”, “rb”);
rv = ps_start_utt(ps, “goforward”);
while (!feof(fh)) {
size_t nsamp;
nsamp = fread(buf, 2, 512, fh);
rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE);
}
rv = ps_end_utt(ps);

/* Get the result and print it */
hyp = ps_get_hyp(ps, &score, &uttid);
if (hyp == NULL)
return 1;
printf(“Recognized: %s with prob %d\n”, hyp, ps_get_prob (ps, NULL));

/* Free the stuff */
fclose(fh);
ps_free(ps);
return 0;
}

On Linux, compile the demo with simple command line:

gcc `pkg-config pocketsphinx –cflags –libs` demo.c -o demo

and run

./demo

If it works, it’s ready to be included into your device. Read more about Pocketsphinx functions
in API guide:

http://cmusphinx.sourceforge.net/api/pocketsphinx/

Once you are done with basic examples, it’s time to build your application using Pocketsphinx.
Free your mind when you design that, don’t just focus on simple commands like “turn on lights”. Modern applications include intelligent logic analysis, continuous dictation support and many more things.
Try to be reasonable, design your interface and grammars, think about user and your speech
application will be successful.

Still don’t believe it will work? Check this video demonstrating pocketsphinx running on Nokia N800(at the top of the post). For more details on Pocketsphinx, CMUSphinx project, speech recognition visit http://cmusphinx.sourceforge.net

Adding speech recognition feature to your device

Here we will consider how is it possible to implement speech recognition functions on using
Pocketsphinx library from CMUSphinx project (http://cmusphinx.sourceforge.net)

The advantages of using Pocketsphinx are:

* Pocketsphinx is resource-efficient. It can perfectly run on embedded platforms
though it’s not limited to them, you can use pocketsphinx on your desktop/server. Pocketsphinx
has support for fixed-point only arithmetics so can run without FPU. It is also optimized
towards some popular platforms: Blackfin, Maemo, IPhone.
* Pocketsphinx supports many languages out-of box. It supports US English, Chinese, French, Russian, German, Dutch and more without need to train anything.
* Pocketsphinx is completely free software.
* Available bindings for several programming languages are present.

So Pocketsphinx is really the best choice for your speech recognition library.

Before you are going to start with programming speech interfaces there are several things you need to know

* Speech recognizers require you to specify the words they will understand (so-called grammar), they
will not understand anything else except specified language.
* Speech is by nature inaccurate, you need to put this in the corner of speech interface design.
Recognizer return you confidence value of the recognized text. Make sure you use this confidence
value to reject unreliable results. If recognizer is not confident, try to input the text again,
ask for additional information, confirm user intentions.
* It’s not the task of the speech recognition library to do sound input. Audio interfaces
are often device-specific. You need to record audio in your application and put it in special
format – PCM, mono, 8kHz, 16-bit. Doublecheck that. If you have mp3, convert it. If you have
audio with 44.1kHz, downsample it.

Let’s start with simple test. Once you installed Pocketsphinx, just run Pocketsphinx_continuous
without any arguments. Wait till

READY…

will appear on terminal then say something. Pocketsphinx will record audio from your microphone and output recognition results.

000000001: hello (-11998485)

You failed to make it recognize hello? Don’t worry, some people find that it’s a hand of fortune who produce the recognition results. Count you are lucky.

Now let’s try to learn how to specify the grammar, the language that Pocketsphinx will recognize.
It’s done using grammar files which are written in JSGF format.

http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/

This is rather simple human-readable text format, probably it’s better to start with example:

#JSGF V1.0;
grammar goforward;
public <move> = go <direction> <distance> [meter | meters];
<direction> = forward | backward;
= (one | two | three | four | five | six | seven | eight | nine | ten | twenty)+;

You see it can specify alternatives, repetitions and skips. Basically JSGF describes
finite state automation for the recognizer. The more restrictive your grammar is, the
better will be recognition accuracy. But don’t forget to include all those fillers
and false starts in real grammar. User will not say to the device

“Pizza with pepperoni”

They will say instead

“I want let me think… three pizzas with pepperoni no… with onions”

And your grammar should cover that. Once you’ve created your grammar, store it as
grammar.jsgf. Also, record audio file at 16khz mono and name it “myrecording.wav”.

Now, let’s do some of the programming. To demonstrate how speech recognition application is created, let’s first try to use Pocketsphinx with Python. Python API is really simple, example is just six lines of code. To recognize speech you need to accomplish 3 steps and here they are:

#!/usr/bin/python

#Step 1, Initialization
import pocketsphinx as ps
decoder = ps.Decoder(jsgf=’/path/to/your/jsgf/grammar.jsgf’,samprate=’8000′)
# Step 2, open the audio file.
fh = open(“myrecording.wav”, “rb”)
nsamp = decoder.decode_raw(fh)
# Step 3, get the result
hyp, uttid, score = decoder.get_hyp()
print “Got result %s %d” % (hyp, score)

Now, let’s do the same with C. It’s not really different from python, just more suitable
for your device.

#include <pocketsphinx.h>

int
main(int argc, char *argv[])
{
ps_decoder_t *ps;
cmd_ln_t *config;
FILE *fh;
char const *hyp, *uttid;
int16 buf[512];
int rv;
int32 score;

/* Initializing of the configuration */
config = cmd_ln_init(NULL, ps_args(), TRUE,
“-samprate”, “8000”,
“-jsgf”, “test.jsgf”,
NULL);
ps = ps_init(config);

/* Open audio file and start feeding it into the decoder */
fh = fopen(“myrecording.wav”, “rb”);
rv = ps_start_utt(ps, “goforward”);
while (!feof(fh)) {
size_t nsamp;
nsamp = fread(buf, 2, 512, fh);
rv = ps_process_raw(ps, buf, nsamp, FALSE, FALSE);
}
rv = ps_end_utt(ps);

/* Get the result and print it */
hyp = ps_get_hyp(ps, &score, &uttid);
if (hyp == NULL)
return 1;
printf(“Recognized: %s with prob %d\n”, hyp, ps_get_prob (ps, NULL));

/* Free the stuff */
fclose(fh);
ps_free(ps);
return 0;
}

On Linux, compile the demo with simple command line:

gcc `pkg-config pocketsphinx –cflags –libs` demo.c -o demo

and run

./demo

If it works, it’s ready to be included into your device. Read more about Pocketsphinx functions
in API guide:

http://cmusphinx.sourceforge.net/api/pocketsphinx/

Once you are done with basic examples, it’s time to build your application using Pocketsphinx.
Free your mind when you design that, don’t just focus on simple commands like “turn on lights”. Modern applications include intelligent logic analysis, continuous dictation support and many more things.
Try to be reasonable, design your interface and grammars, think about user and your speech
application will be successful.

Still don’t believe it will work? Check this video http://www.youtube.com/watch?v=OEUeJb6Pwt4
demonstrating pocketsphinx running on Nokia N800. For more details on Pocketsphinx, CMUSphinx project, speech recognition visit http://cmusphinx.sourceforge.net

26 thoughts on “Adding Speech Recognition To Your Embedded Platform.

  1. Hate to be that guy… but speech is misspelled in the article title.

    As for the article itself, I’m stunned. That’s an amazing piece of software they’ve got going. I’d love to see somebody develop a third-party app for the iPhone that doesn’t “play songs by Beck” when I’m trying to “dial home”.

    I remember when Dragon NaturallySpeaking came out for Windows 95 ages ago. It had terrible accuracy, but with clear articulation and some training (on both ends), it would spit out a decent output. It’s amazing to see how far technology has improved. Now I’m just waiting to step on an elevator and say “Ten Forward”…

  2. mostlymac .. that requires a very general grammar and is very hard to train.

    It is best if you train on a small grammar likes letters, numbers, and directions.

  3. Would it be possible to use this with one of the more powerful microcontrollers? I only need to be able to recognize at most 10 words and I can easily cut that back to 4 words without losing the intended functionality of what I’m trying to develop.

  4. Gottabethatguy, what kind of microcontroller are you talking about, what are specifications?

    The requirements for HMM-based recognition are still high, but it’s possible to find more lightweight solutions for your case.

  5. I have a robot and I want to use Pocketsphinx so I can talk to the robot thing like…where is this room and it will tell me where it is or move foward and it should move forward. Right now I have install pockectsphinx.07 and sphinxbase and when I run using ubuntu 10.04LTS: pocketsphinx_continuous -lm 1998.lm -dict .dict 1998.dic it say READY then listening the when I say something like Good morning it write back Goodmorning….But how do I go from here…how do I use pocketsphinx to allow me to just talk and have what I just said be recorded and send to my robot to move…PLEASE HELP w78steve@gmail.com

  6. Hi!

    I’m trying to run your examples in Python and C, both give me the following error:

    ERROR: “acmod.c”, line 88: Must specify -mdef or -hmm

    Do you know what’s triggering this problem?

    Thanks in advance.

    1. have you solve this issue?
      ERROR: “acmod.c”, line 88: Must specify -mdef or -hmm

      my command is
      /usr/local/bin/pocketsphinx_continuous -infile “/var/spool/asterisk/voicemail/default/1111/INBOX/msg0007.wav” -hmm /var/lib/asterisk/communicator -samprate 8000 2

  7. I installed sphinxbase and pocketsphinx doing ./configure, make and sudo make install. When I run pocketsphinx_continuous it works, but when I try to compile the example, I get: “demo.c:1:26: fatal error: pocketsphinx.h: No such file or directory
    compilation terminated.”
    How can I tell gcc where is pocketsphinx?

    Thank you!

    1. I solved the problem by adding the paths to the .h files:
      gcc `pkg-config pocketsphinx –cflags –libs` -I/home/pi/Instalaciones/voice-recognition/pocketsphinx-0.8/include -I/home/pi/Instalaciones/voice-recognition/sphinxbase-0.8/include/ demo.c -o demo.o
      Now I get a stranger output:
      /tmp/ccXiWv5C.o: In function `main’:
      demo.c:(.text+0x18): undefined reference to `ps_args’
      demo.c:(.text+0x50): undefined reference to `cmd_ln_init’
      demo.c:(.text+0x5c): undefined reference to `ps_init’
      demo.c:(.text+0x84): undefined reference to `ps_start_utt’
      demo.c:(.text+0xd8): undefined reference to `ps_process_raw’
      demo.c:(.text+0xf8): undefined reference to `ps_end_utt’
      demo.c:(.text+0x118): undefined reference to `ps_get_hyp’
      demo.c:(.text+0x140): undefined reference to `ps_get_prob’
      demo.c:(.text+0x164): undefined reference to `ps_free’
      collect2: ld returned 1 exit status

      Does anybody how to solve this?

      Thanks!

    1. Hi Diego09310, I’m having the same problem you had:

      demo.c:(.text+0×18): undefined reference to `ps_args’
      demo.c:(.text+0×50): undefined reference to `cmd_ln_init’
      demo.c:(.text+0x5c): undefined reference to `ps_init’
      demo.c:(.text+0×84): undefined reference to `ps_start_utt’
      demo.c:(.text+0xd8): undefined reference to `ps_process_raw’
      demo.c:(.text+0xf8): undefined reference to `ps_end_utt’
      demo.c:(.text+0×118): undefined reference to `ps_get_hyp’
      demo.c:(.text+0×140): undefined reference to `ps_get_prob’
      demo.c:(.text+0×164): undefined reference to `ps_free’

      What do you mean by “When I copied the comand to compile, the double dash (–) was replaced by a longer dash (em dash? —).”??

      1. Hi fito_segrera, I think I didn’t explain well (by reading my comment again).

        In the command “gcc `pkg-config pocketsphinx –cflags –libs` demo.c -o demo” you can see that the dash before cflags and libs is longer than the dash between pkg and config or -o. This is because it’s meant to be two dashes “- -” (I introduced an space between them so they appear as two dashes in the comment).

        If you don’t understand me (I’m not being as clear as I’d like to), I suggest you look at the pkg-config example in the wikipedia: http://en.wikipedia.org/wiki/Pkg-config

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.