Create Your Own J.A.R.V.I.S. Using Jasper

Tony Stark’s J.A.R.V.I.S. needs no introduction. With [Shubhro’s] and [Charlie’s] recent release of Jasper, an always on voice-controlled development platform for the Raspberry Pi, you too can start making your own J.A.R.V.I.S..

Both [Shubhro] and [Charlie] are undergraduate students at Princeton University, and decided to make their voice-controlled project open-source (code is available on GitHub). Jasper is build on inexpensive off-the-shelf hardware, making it very simple to get started. All you really need is an internet connected Raspberry Pi with a microphone and speaker. Simply install Jasper, and get started using the built in functionality that allows you to interface with Spotify, Facebook, Gmail, knock knock jokes, and more. Be sure to check out the demo video after break!

With the easy to use developer API, you can integrate Jasper into any of your existing Raspberry Pi projects with little effort. We could see Jasper integrated with wireless microphones and speakers to enable advanced voice control from anywhere in your home. What a great project! Thanks to both [Shubhro] and [Charlie] for making this open-source.

79 thoughts on “Create Your Own J.A.R.V.I.S. Using Jasper

  1. Ok that’s amazing… I was working on a web based Virtual Intelligence called Lydia at one point with commands through XML and retro representation of answers (black screen, green text and blinky cursor). This will fit right in.

    1. You should be able to follow the Method #2 steps in the software documentation from the project’s site and instead of downloading the Jasper binaries and transferring to the pi, compile the Jasper project from source for the UDOO. You could clone the repo to the UDOO and compile on the UDOO, or cross-compile from another computer.

      (All stated without testing first. Nothing looked out of place on the dependencies and Jasper itself is Python)

    1. I was thinking thinking the same thing. Almost makes me want to take my old laptop and turn it into a Linux machine. though I would see if I can use a different name instead of Jasper. No offense to the Jasper crew.

      1. Depending on the hardware requirements I wonder if you could run it on something like a NexusS HTC Evo or some older hackable cell phone. Just thinking what great remotes they would make.Another option is the first get Nexus 7s. For me the home control possibilities are what is most interesting.

      2. If you look at the main.py on their github it has

        conversation = Conversation(“JASPER”, mic, profile)

        which is what sets up the name it listens for. Beyond changing it there possibly the languagemodel_persona.lm and dictionary_persona.dic files need editing. I’m assuming that they somehow are used by the voice recognition.

        I plan on playing with this over the weekend, my goal is to get it running for my 4yo son so he can request movies.

        1. Their static/audio/jasper.wav is decoded by python in their main mic function as the “persona” so I assume that’s also part of the listen process, it would be very easy to change the conversation(“namehere” bit and then make your own wav with whatever name, seems all you’d need to do?

      3. Actually, I was looking at the custom install method (as opposed to get a full SD card image) and it’s all apt-get installs, get source code and compile. Maybe you can get away with anything looselly simillar to debian or ubuntu directly, or “translate” those installs to your distro package manager equivalents.

    1. Running it in an emulated environment would be an useless waste of resources. Anyway it uses pocketsphinx which is cross platform, so at least the core part of it can be ported easily.

  2. Is it speaker independent? Does it recognize only prerecorded phrases or vocabularies, or arbitrary speech? How is accuracy? Is it tolerant of some ambient room noise when using an open microphone, or does it have to be dead silent?

    None of these BASIC details are covered here, in the video, or in any obvious location on the creators’ website… FAIL.

    But SQUEEE! It integrates with social networking! And it’s been compared to an impossibly intelligent AI in a popular movie! Good enough for the mindless masses I suppose.

    1. If you read the documentation on the linked page, you would see the answers to your questions. It uses arbitrary speech recognition. Commands and functions can be defined by the user
      Why make a wild accuracy claim? If you enunciate clearly into a decent microphone, it should work. If you want to believe claims of amazing voice recognition, go buy Dragon.

      1. Which part of the documentation?

        It’s certainly not in the home page, “FAQ”, “Software”, or “Usage”. All that’s left is the “Developer API”, which I admit I didn’t do; but if I have to dig through that just to find out basic info like this, that’s utter crap.

        And as for the responses here, they conflict as well, so I’m not the only one confused. Funny we just featured an article on Mr. Widlar, wish he was around, I’m sure he’d have some particularly choice words about the quality of this “documentation”. ;)

        I certainly don’t need or believe marketing department generated accuracy claims either. But I would like to have at least some idea what to expect before investing too much time looking into it. Is it closer to Dragon, or a old IC-based speech recognition chip? Could I expect the actions depicted in the carefully edited video to work at least 50% of the time in the real world? Was careful training (of both software and speaker) and a dead silent room required? Did it take 100 attempts to film this success? Did the software already have the particular artist name programmed and *spoken* to it in advance, or was it able to recognize it without hearing it before?

        1. wow…sounds like you should be developing your Speech package, since you definitely seem well versed in the subject. I mean I’d definitely prefer to use yours as you seem so confident, and sound very intelligent, your version would probably kill theirs.

          I mean their only undergrads, so that they couldn’t have have made as good a device as yours would be. Ans i also agree that marketing machine BS on their website is soooo mis-leading. It was probably all bought and paid for, I would never believe any of it.

          YET AGAIN SOMEONE FEELS THE WHOLE WORLD SHOULD CATER TO THEIR WANTS AND NEEDS, AND WHEN THAT DOESN’T HAPPEN IT MUST MEAN THE OTHER PERSON IS INFERIOR OR SOME SUCH.

          If you’re too lazy to read the documentation as provided, then you FAIL.Everything you speak of might have validity if this were being sold on amazon or something, but its not. Its undergrad project that was probably used for learning stuff, which probably means its at a beta (if not alpha) level of development. Don’t want to use? Don’t. But don’t piss on their work because it makes you think it makes you look cool to do so..you asctually look pretty stupid to me!
          …..

          1. Because expecting a decent basic description of WHAT SOMETHING DOES is a special need? Because undergraduates and site editors can’t be expected to tackle the herculean task of writing up a paragraph or two that provides this information, and placing it somewhere prominent?

            I spent about 15 minutes reading, which should be more than enough to find such basic info. If you’re going to assume I’m lazy, I’m going to assume I spent 15 minutes more than you, and you have absolutely no idea what you’re talking about.

            And I’m only criticizing the documentation. Nowhere did I criticize their software or programming skills, or infer I could do better. Which makes you the stupid one for claiming otherwise, no assumptions necessary there.

        2. Hi Chris,

          I’ll respond to your comment below here, as it seems it is too deep to reply there.

          You said:

          I spent about 15 minutes reading, which should be more than enough to find such basic info. If you’re going to assume I’m lazy, I’m going to assume I spent 15 minutes more than you, and you have absolutely no idea what you’re talking about

          I spent about 10 minutes, and I found out it uses the CMU
          speech recognition engine, and found out lots of information about that engine as well – including the answers to all your questions.

          I did not spend as much time as you, so I would certainly not think you lazy…however I’m not sure you should be calling others stupid.

    2. Is it speaker independent? Yes
      Does it recognize only prerecorded phrases or vocabularies? Yes
      Is it tolerant of some ambient room noise when using an open microphone, or does it have to be dead silent? Don’t know.
      It uses CMUSphinx for the speech engine.
      The website for the project is actually pretty nice.

    3. I was thinking something similar(but not so critical, it’s still a nice project).

      It lacks information and the raspberry pi it’s not a very powerful system to begin with.

      Apple, Google and Microsoft get away with natural language processing in smartphones by sending the query as audio to be analyzed in their(in Apple’s case, wolfram’s servers) servers but if this depends on the CPU available to the rPi the accuracy might not be so great.

      I’m guessing it works by keyword recognition so it’s usefulness will be way less “Jarvis” or “Siri” and more “predefined list of commands”.

      Also they don’t mention anything about support for languages other than English(Not all Speech recognition frameworks have the same level of accuracy for every language).

      As I said, nice project but the presentation might give people a incorrect impression on the scope of the project.

      PD: Now I’m wondering how well would the Natural Language Toolkit ( http://www.nltk.org/ ) work on the rPi.

      1. NLTK is a framework/toolset for computational linguistics on a body of text (corpus linguistics). It’s fun to play with but is not related to this story in any appreciable way.

        As far as running it, you shouldn’t have any trouble. NLTK is written python and its dependencies (Python 2.6-2.7, Numpy, PyYaml) are all available for raspberry pi.

        NLTK processing can be slow(seconds) even on a core2quad 2.5 GHz, but that’s not a problem considering what it’s used for. “Jarvis, what are the hapaxes for Obama’s speech?”

        1. I know it’s not related to the story and I know it’s slow, I just mentioned it because the story made me wonder how well would it work in the rPi and if it would be even barely usable to create something more “Jarvis like” than what’s mentioned in the article.

          As you say it can be slow on a modern x86 computer so I don’t have high hopes for it being useful in any current ARM board.

    4. The recognition is all done by the CMU Sphinx package, which has been around for ages— what Jasper/JARVIS does is use Sphinx to recognize a specific handful of commands. The social-networking callouts and stuff like that is what’s new here.

  3. This says “internet connected pi” So it isn’t stand alone, right? It sounds as though you are sending your recorded (or live) audio off to the cloud to be processed. What ever could go wrong with that?

    1. I get that you’re trying to troll, but in all honesty: what really could go wrong?

      It’s not like your voice could be sold for marketing reasons and as long as you aren’t saying blatantly illegal (“Jasper, buy 100 pounds of C4.”) things there shouldn’t be a prob.

      Then again, you could route it through TOR.. or not. Thanks OpenSSL.

    2. Dude, control yourself. At least provide a counter-argument or some point of disagreement. What’s arrogant about a gentle reminder of the wild Internet’s lack of inherent safeness? Personally i think he’s being over-cautious, but that’s not arrogant, fucking, or twat.

      If you’re going to be unpleasant, at least provide a basic level of intellectual justification with it. If you just want someone to say “fucking arrogant twat” at all day, I’m sure there’s a gutter-full of unmet friends waiting for you somewhere round the back of a cheap booze shop.

    3. It uses pocketsphinx – which does not seem to require sending the audio to the cloud for processing.

      Spenidng 2 seconds looking something up before complaining…what could go wring with that?

      I agree with Yarr….

    4. After setting it up I think the internet connection is for the email notification and things like spotify, not something that it absolutely has to have to work.

      The thing I like the least about it is that they have the music/spotify integrated into a main program component instead of in one of the modules like the other options. IDK python well enough to know if they had a valid reason for that.

  4. I’m a newcomer to raspberry pi and would really like to try this project. I see that the microphone the recommend is no longer available on amazon. Does anybody have a recommendation for one that would work for this project?

    1. This

      http://www.amazon.com/Kinobo-USB-Microphone-Desktop-Laptop/dp/B0052SBAEU/ref=pd_cp_pc_1

      is recommended by Amazon on the link from the sold-out one. The recommended one is only a $15-$25 USB microphone. At that price it’s not going to be anything special, so presumably any old USB mic would do. Or maybe another source of sound altogether, a USB soundcard, say, with any old mic plugged in. That depends on what inputs the software accepts, should be pretty tweakable being a Linux project.

      From what I can tell, any USB mic should do. That’s Linux again, they like to get everything connected to a standardised interface from the software side, so if your mic works with Linux it’s driver should present the same interface to Jasper as any other mic would. The “quirks” you have to put up with in Windows usually aren’t permitted.

      That’s all I can say without experience. You could always try out whatever bits and pieces you have lying around, or can borrow. I’d bet that it doesn’t really matter and any simple USB mic would do.

  5. Hii,
    According to the documentation provided by the two, I’ve managed to make it on my pi within a few hours but I’m stuck at a very funny point..
    I am new to the raspberry pi and the linux environment so don’t know much..
    The problem I am facing is that ‘How to run jasper after configuring the jasper client ??’
    I’ve done all the coding and everything has went fine but I don’t know how to start jasper!
    I am at the shell right now [pi@raspberry -$] and which command shall I run so that jasper listens to my commands ??
    Also, in the documentation we had to add the Facebook API key in the ‘Profile.YML’ file, now how shall I do that ??
    Where is that file and how to edit it ???

    Thanks!

    1. This is directly from the documentation and should run Jasper when the OS boots up.
      “””
      Run crontab -e, then add the following line, if it’s not there already:

      @reboot /home/pi/jasper/boot/boot.sh;
      Set permissions inside the home directory:

      sudo chmod 777 -R *
      “””
      I have to say, this is a nice project and yet another addition to the list of projects to build with my kids.

  6. I’ve got jasper up and running on my pi…mostly.
    Upon startup, I get the voice prompt “this is jasper, please wait a moment”
    Even after a few min, jasper doesn’t respond to his name with the beep.
    I’ve verified the mic works with an arecord test file and playback of the file with aplay is good also.
    I’ve run the main.py –local to try and see what’s happening and my spoken command of “jasper” doesn’t register as input on the main.py test. Its as though jasper is running properly but isn’t listening on the mic. But that’s just a guess.

    Does anyone have some ideas that may help?

  7. Gabe,
    I had a similar problem. Being a noob I followed the documentation literally. It instructs that some of the commands are carried out in the ‘home’ directory, so I changed to cd home. Could not get it working. I then formatted the sd card and started again but carried out all the commands in the directory that the sd card booted in to. I think the key point being all the core code and modules and any configuration commands you make must be in the same directory. Other than that make sure your syntax and indentation is correct ( sorry if that sounds condescending – not meant to be, as I said I’m a complete noob ).
    Gareth.

    1. I’ve been installing it on latest Ubuntu release of 14. soemthing. It is running without errors, but it won’t respond to my speech even though I verify it translates the speech. Other users on Ubuntu are having this issue as of right now. Someone will come up with a fix I’m sure.

      Go try for Arch Linux. The install should be easier too.

    2. Lots of fun, pretty cool project. I will answer some questions from other users to clear up some stuff that I have dealt with.

      As of Sept. 2015 (could be updated in future) Documentation is somewhat outdated on 4 lines I found. If you poke around there easy workarounds. Google Group with other users very helpful to search.

      Yes, you can edit your own commands and responses. You can even use other ‘modules’ made by others if you like. Up to you, but they don’t come by default.
      It’s only somewhat challenging to change the commands. Easy to change the name.

      It can be EITHER standalone or online. You can choose this when you set it up. Also you can now do a mix so you don’t send everything you say to google, apple or at&t if that is what you choose. You would use the passive standalone one to detect it’s name that you gave it (default jasper), then the following phrases it would send off for more power and accuracy on recognition. All configurable and not that hard.

      Don’t think this particularly works on windows. You could however run Virtual Box (or other VM software) for free, and virtually host a Linux flavor to run this. Haven’t tried it but if you love windows go for it. Personally I have a small laptop with Ubuntu and main Windows PC which I’m leaving alone. Windows mixing with linux sounds no fun.

      You can use a Laptop or Desktop. Anything to run a Unix / Linux / Mac.
      From the documentation and what I went through Ubuntu will load.
      You will have an easier time seems like with ArchLinux or Debian flavor of Linux as well.

      If I were to do over again (not installing on rasberry pi – because I don’t have one) I would try these first:
      1) ArchLinux
      2) Debian based flavor of linux (I used Ubuntu) – to save some extra compiling
      3) OSX Mac – Decently detailed thread for this, and users have got it working reportedly. Have not tried it. You will be compiling a bit more than linux, but decent tutorial seems like. You will have to go by the main documentation first, and then mix in the workarounds from this forum.
      https://github.com/jasperproject/jasper-client/issues/35
      4) Linux / Unix? Not tried this – Even ubuntu is sorta experimental

  8. Great just what I was looking for I’m going to make this tommorow, if it works I’m going to get multiple modules and put it everywhere (room, car, phone etc.) I still don’t know how to put it into a phone but I should work ;)

  9. Great just what I was looking for I’m going to make this tommorow, if it works I’m going to get multiple modules and put it everywhere (room, car, phone etc.) I still don’t know how to put it into a phone but I should work

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.