Talking Star Trek

Speech generation and recognition have come a long way. It wasn’t that long ago that we were in a breakfast place and endured 30 minutes of a teenaged girl screaming “CALL JUSTIN TAYLOR!” into her phone repeatedly, with no results. Now speech on phones is good enough you might never use the keyboard unless you want privacy. Every time we ask Google or Siri a question and get an answer it makes us feel like we are living in Star Trek.

[Smcameron] probably feels the same way. He’s been working on a Star Trek-inspired bridge simulator called “Space Nerds in Space” for some time. He decided to test out the current state of Linux speech support by adding speech commands and response to it. You can see the results in the video below.

For speech output, he used pico2wave and espeak. There’s also Festival, but he couldn’t get that one working. He also used PocketSphinx for speech recognition and provides information on how to pretrain the system for words to make it respond better.  In the video, you can see that it isn’t perfect, but it is pretty good.

The other part of the equation is recognizing natural language and [Smcameron] discusses that, as well. If this were a real starship, you might need to do a little work on the user interface to be sure you heard the right thing before taking some drastic action (“I said: ‘blow warp drive’ not ‘go warp five’!”).

If your system isn’t as powerful as a full Linux box, consider uSpeech for the Arduino. You might also check out Jasper.

34 thoughts on “Talking Star Trek

  1. I love the “kitchen sink” approach that Stephen has decided to take with this project. I admit that I kinda miss the primitive wireframe graphics that the game used in early stages, but it’s fun to watch unusual and unique features make it into production.

    1. Ha. smcameron here. This was to work around a problem I ran into reading, iirc, ascii wavefront obj files. Those files contain floats. I was parsing them with sscanf. sscanf expects different things for floats depending on locale, sometimes periods for decimals, sometimes commas, which is kind of crazy since the data files I am reading come wit the program, and I really don’t want per-locale files. This is a starship simulator, we’re post-national here, right? No problem, though, just set the locale with setlocale() at the beginning of your program, right? Well, no, because various libraries seem to keep calling setlocale() — many many times — and undoing your setlocale() — (I’m looking at you, gtk). I got fed up with this nonsense, why is sscanf behaving so stupidly anyway? Why is gtk behaving so stupidly? Fine, children, you want to call setlocale? Fine. So I override it once and for all, and anybody who calls setlocale will get a locale for sure, and that locale will be “C”, and all will be right with the world.

      1. oh i’ve been on that ride before, that’s why i immediately lolled. Thanks for replying, you have a really cool project going on, i hope you continue to develop it. I was looking over the server code after reading the contribution.md, and i would love to contribute at some point, i might join up later.

  2. Shameless plug, we (some guys at Ultimaker) build our own star-trek like bridge simulator as well. It’s FOSS, and can be found at: http://emptyepsilon.org/ and we suck at putting movies online :-)
    It’s 2 years in the making. And, it has an HTTP API, which can be use used for all kinds of things.

    I tried the same thing as SNIS did, voice control. But then with the google chrome voice API. As the HTTP API is easy to access from javascript. However, I’m not sure if it’s my dutch accent, I found the voice recognition results to be quite bad.

    1. Yes that project was amazing, what happened to it?

      As for local voice recognition it works much better if there is a limited vocabulary and the system has been trained/biased accordingly.

      My biggest problem with voice is the environment here is sometimes very noisy and chaotic with machines and multiple people, it is even hard to have a phone conversation at times. So I agree sophisticated gesture input would very nice and I expect that complex hand signatures could serve as a form of password too. Just sign you name in mid air, or whatever.

    2. Simple use local speech recognition to trigger the device like saying Hello Siri, Okay Google, or Hello Jane to start activate the cloud based Speech Recognition. That way you can have your own always listening Speech Recognition system.
      Of course once you have the speech to text you might want to use something like https://adapt.mycroft.ai/ or https://opennlp.apache.org/ and maybe throw in http://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html
      just for fun.

  3. Commercial products work quite well, specially when it’s trained, and I’m not by far a native english speaker..
    Just tried the standard OSX mt-lion ASR on the content of the video, via internal mic, works almost better (“warp” isn’t a word), than the video itself !
    The big trouble for Open Source are the missing of enough voice profiles, and that were clouded system wins because with each sentence recognized their voices profiles increase. But privacy…
    What I hope, is a system that can talk and sound like me, so I could loop this output with the original speech to get way better recognition. Or scam Google,Siri… with boring cooking recipes or marxist books, between my personal requests.

    1. exactly, it’s all about the training. I’m actually sort of surprised there isn’t a proper open source ASR “cloud” option yet. Of course I say that as a person who would USE it, not MAINTAIN it.. so yeah

  4. “Every time we ask Google or Siri a question and get an answer it makes us feel like we are living in Star Trek.”

    I think there is a episode in the original series where they go in a parallel universe where everybody is a fascist mean duplicate.
    That’s probably the place where they would have Google/Siri/Cortana then.

      1. “Every time we ask Google or Siri a question and get an answer it makes us feel like we are living in Star Trek.”

        Creo que hay un episodio de la serie original donde van en un universo paralelo donde todo el mundo es un medio fascista duplicado.
        Eso es probablemente el lugar donde tendrían que Google / Siri / Cortana a continuación.

          1. If you are unable to read plain English perhaps some type of professional help like a teacher can aid you.
            Perhaps that is also better than to complain to others about your inabilities.

          2. If you are unable to write plain English perhaps some type of professional help like a teacher can aid you.
            Perhaps that is also better than to inflict onto others your inabilities.

Leave a Reply to LeithoaCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.