Almond: Open Personal Assistant From Stanford

The current state of virtual personal assistants — Alexa, Cortana, Google, and Siri — leaves something to be desired. The speech recognition is mostly pretty good. However, customization options are very limited. Beyond that, many people are worried about the privacy of their data when using one of these assistants. Stanford Open Virtual Assistant Lab has rolled out Almond, which is open and is reported to have better privacy features.

Like most other virtual assistants, Almond has skills that determine what it can do. You can use Almond in a browser, on a Google phone, or as a command line application. It all lives on GitHub, so if you don’t like something you are free to fix it.

The skills are on a market-like thing known as Thingpedia. There are a surprising number, although not nearly as many as commercial devices. The assistant can integrate with Nest, GNOME, Gmail, Twitter, Slack, and many more services.

The natural language processing is impressive. Here are some examples from the web site:

  • When the New York Times has an article about China, translate the headline to Chinese, then email it to my friend.
  • When I leave home, turn off the heating.
  • When I post to Twitter, copy the post to Facebook.
  • Get the Bitcoin price and then send it to my colleague on Slack.

The web site is a little glitzy and the GitHub will take some time to parse. However, the documentation is very readable.

Almond is begging to be run on a smart speaker and there is a way to do it. You can even run it using a docker image that is already configured.

What we really want to do is build Almond into a robot. For now, we may just repurpose a Google Raspberry Pi.

19 thoughts on “Almond: Open Personal Assistant From Stanford

      1. Annoying. At the rate this sort of acquisition happens with anything interesting, it might be practical for developers looking to work on a project that actually stays on-track to consider going with a worker-owned tech collective. At least then, there’s a greater degree of actually answering to people with relevant expertise.

  1. My biggest question is one I can’t find a direct answer to, does this do the language processing locally or on a remote server. If it’s the second “Sorry, still not interested”. I don’t want to bring a remote spy bug into my home (whether it’s intended that way or (potentially) turned into one later doesn’t matter).

    1. After a brief browse through the GitHub organization, it offloads language processing to a remote server, however you define what server that is, and the full source code for said server is available. So it can’t “phone home” if home is your own server under your control.

          1. Worst case is that is required today. There is at least one Risc V processor with some AI acceleration on chip. In a few years, those racks might be a stack of Raspi-like boards that fit in a shoebox…

        1. That’s brilliant, b/c you can run the heavy lifting on a computer of your choosing / server in the closet / AWS under your control, and still keep the device lightweight.

          Keep the data within your WiFi and it’s both fast and private. Winner.

          1. I agree with Elliot, there are certain thing which require hefty computer horsepower. Voice recognition and ML/AI come to mind. I do want my own control and would like to avoid the cloud (privacy and connectivity come to mind) but this works for me (so far). I now have a Z-Wave dongle, a ZigBee dongle and WiFI to cover my HA. And I can add BT and RF to that if I like.

    2. “Hello Wiretap, please open my garage door, and while you’re at it let the NSA, advertisers and whoever else has access to this Speech-To-Text server and wants to build a pattern of my behavior know what I’m up to.”

        1. It’s not just smartphones. With smartphones they also get location data. But everything you say, write in an e-mail, or post on the internet is potentially monitored.

          https://en.wikipedia.org/wiki/Communications_Assistance_for_Law_Enforcement_Act
          “interception of communications for Law Enforcement purposes, and for other purposes.”

          (Oh, “other” purposes. Well, at least that’s clearly limited in scope by the law…)

          1. Okay Google to say Alexa to say okay google …

            I think Amazon is getting a bit crazy with it’s voice assistant everywhere. I’m expecting my box of cereal to have a secret surprise of an Alexa in every box. Mmmm Chocobombs.

Leave a Reply to arcturusCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.