Making Linux Offline Voice Recognition Easier

For just about any task you care to name, a Linux-based desktop computer can get the job done using applications that rival or exceed those found on other platforms. However, that doesn’t mean it’s always easy to get it working, and speech recognition is just one of those difficult setups.

A project called Voice2JSON is trying to simplify the use of voice workflows. While it doesn’t provide the actual voice recognition, it does make it easier to get things going and then use speech in a natural way.

The software can integrate with several backends to do offline speech recognition including CMU’s pocketsphinx, Dan Povey’s Kaldi, Mozilla’s DeepSpeech 0.9, and Kyoto University’s Julius. However, the code is more than just a thin wrapper around these tools. The fast training process produces both a speech recognizer and an intent recognizer. So not only do you know there is a garage door, but you gain an understanding of the opening and closing of the garage door.

In addition, the tools are all made to work in Unix-style pipelines which is refreshing. Here’s an example configuration from the project’s website:

[GarageDoor]
open the garage door
close the garage door

[LightState]
turn on the living room lamp
turn off the living room lamp

There are templating features so you can specify optional words and alternative words in a single rule. There are other features like mapping an object like living room lamp into something more computer-friendly.

Overall, this looks like a fun tool to have in your kit. If you do something interesting with it, be sure to drop us a tip so we can cover it. Meanwhile, we’ve been watching Linux speech for quite a while. Of course, what we really want is speech commands like the USS Enterprise, and we have to admit it is getting closer.

9 thoughts on “Making Linux Offline Voice Recognition Easier

  1. @Martin said: “Can anybody recommend a good offline speech recognition engine that has pretrained models for german?”

    Maybe…

    —-[Mozilla DeepSpeech & German]—-==

    * Search for: “Mozilla DeepSpeech German” without the ” “s:

    https://duckduckgo.com/?q=Mozilla+DeepSpeech+German&t=ffab&ia=web

    * DeepSpeech for German Language

    https://discourse.mozilla.org/t/deepspeech-for-german-language/36527/14

    * German End-to-end Speech Recognition based on DeepSpeech

    https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech

    * AASHISHAG / deepspeech-german

    https://github.com/AASHISHAG/deepspeech-german

    * ynop / deepspeech-german

    —-[Mozilla DeepSpeech in-General]—-

    * mozilla / DeepSpeech

    https://github.com/mozilla/DeepSpeech

    * Welcome to DeepSpeech’s documentation!

    https://deepspeech.readthedocs.io/en/r0.9/?badge=latest

  2. The best thing for building assistants I’ve found so far is Rhasspy. I plan on doing a proper writeup of setting up my instance once I’m finished. I still have to make that reaspeakerd work on recent Debian (without doa it’s sad performance) and figure out wakeword training.

  3. What about Simon & Blather? Haven’t tried them yet but Simon, for instance, seems pretty good and simple enough to use. Although, I’m not sure it’s still developed. The last version I could find was 0.4x, released in 2017 but it was supposed to lead to v0.5, which I can’t find.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.