Seeed Studio’s ReSpeaker Speaks All the Voice Recognition Languages

Seeed Studio recently launched its third Kickstarter campaign: ReSpeaker, an open hardware voice interface. After their previous Kickstarted IoT hardware, such as the RePhone, mostly focused on connectivity, the electronics manufacturer from Shenzhen now tackles another highly contested area of IoT: Voice recognition.

The ReSpeaker Core is a capable development board based on Mediatek’s MT7688 WiFi module and runs OpenWrt. Onboard is a WM8960 stereo audio codec with integrated 1W speaker/headphone driver, a microphone, an ATMega32U4 coprocessor, 12 addressable RGB LEDs and 8 touch sensors. There are also two expansion headers with GPIOs, I2S, I2C, analog audio and USB 2.0 and an onboard microSD card slot.

The latter is especially useful to feed the ReSpeaker’s integrated speech recognition engine PocketSphinx with a vocabulary and audio file library, enabling it to respond to keywords and commands even when it’s not hooked up to the internet. Once it’s online, ReSpeaker also supports most of the available cloud based cognitive speech recognition services, such as Microsoft Cognitive Service, Amazon Alexa Voice Service, Google Speech API, Wit.ai and Houndify. It also comes with an SDK and Python API, supports JavaScript, Lua and C/C++, and it looks like the coprocessor features an Arduino-compatible bootloader.

The expansion header accepts shield-like hardware add-ons. Some of them are also available through the campaign. The most important one is the circular, far-field microphone array. Based on 7 XVSM-2000 respeaker_meow2digital microphones, the extension board enhances the device’s hearing with sound localization, beam forming, reverb and noise suppression. A Grove extension board connects the ReSpeaker to the Seeed’s current lineup on ready-to-use sensors, actuators and other peripherals.

Seeed also cooperates with the Meow King Audio Electronic Company to develop a nice tower-shaped enclosure with built-in speaker, 5W amplifier and battery. As a portable speaker, the Meow King Drive Unit (shown on the right) certainly doesn’t knock your socks off, but it practically turns the ReSpeaker into an open source version of the Amazon Echo — including the ability to run offline instead of piping everything you say to Big Brother.

According to Seeed, the freshly baked hardware will ship to backers in November 2016, and they do have a track-record of on-schedule shipped Kickstarter rewards. At the time of writing, some of the Crazy Early Birds are still available for $39. Enjoy the campaign video below and let us know what you think of think hardware in the comments!

19 thoughts on “Seeed Studio’s ReSpeaker Speaks All the Voice Recognition Languages

  1. Sorry, a 575 MHz MIPS SoC[1] that can physically only address 128MB/256MB RAM is clearly not the device I’d do signal processing on – Yes, I guess, pocketsphinx works, but I honestly think you’d be better of with about any Raspberry Pi.

    The combination with the DSP chip that handles the whole mic array processing / audio quality improvement is interesting, though – maybe as a USB device to a proper processing platform, this would be great fun :)

    So, now let the comment trolls take over and explain to you why letting Google/apple/amazon/LG/sony listen in on your living room might be a bad idea, and only the offline-capability is interesting (by the way, I fully agree with that. I just don’t prefer to hate, even on things that read a lot like an advert for a product of a grown company that could just do stuff without crowd funding).

    [1] https://labs.mediatek.com/fileMedia/download/9ef51e98-49b1-489a-b27e-391bac9f7bf3

        1. Shopping lists would be among those. And you know that no-one needs a shopping list to bring back milk from the supermarket, but to remember the less obvious things.

          Things like reminders would be the next thing. Sure, “Meeting with Moritz, tomorrow 5pm” isn’t something that’d need big vocab, but “Meeting with Marcus to discuss complexity relation to size of vocabulary in speech recognition” would surely do – and being able to enrich my memos with actually useful info would be how this would be an improvement over a pocket calendar made out of dead trees.

          1. It’s true, the hardware is extremely limited. Although I think PocketSphinx is more a little offline tool on the ReSpeak. You’ll probably want to keep your shopping list and calendar appointments synced across your devices, one of the cloud-based speech services is probably the way to go for these.

    1. He’s right, you are likely to be restricted to 100 words or so if you want it to be near real-time recognition. It would be interesting to see what you could manage using an FPGA.

      1. Better-than-pocketsphinx big-vocabulary speech recognition via FPGA? You could probably manage to get a couple million Euro in research funding, and a lot of international scientific recognition for that. That, and your 10 PhD students could get their PhD.

        Just a guess. FPGAs aren’t some magic boxes that you throw at problems. What you’re essentially saying is “let’s implement a very complex digital thing in hardware, I heard that works much better than software” and ignoring the fact that implementation alone is extremely complex.

  2. I like the microphone array, and how it detects where speech is coming from. I think the next best thing is using the other microphones opposite, and perpendicular to, the active microphone as noise cancellation.

  3. Offline speech recognition is a big plus in my book.
    I did a nice set up a couple of years back. A dozen boards each capable of finding a keyword very selectively and then looking at a short list of commands. Having them all listen at the same time and compare notes, produced quite a functional system.

  4. BTW, you can use the online speech recognition engines to train your offline engine. :-) Muahahahahaaaaa!

    That is the price “Big Cloud” pay for gathering data on your entire life, you get to use their infrastructure to train your own AI systems and when it is accurate enough you cut the link to the cloud permanently.

  5. Please do not forget how already parsed words are so much easier for whatever is on the other end to do intelligence on you while you are not actively using the interface. A bug is one thing, but a stenographer sending copy to whomever is wanting it is a whole different ballgame.

  6. Ordered one. Plan to use it as a frontend for a RasPi Alexa device to let me have voice activation (with a customized wake word!), tailor the syntax a bit (to get around the clunky syntax needed to activate many third-party “skills”), and have a nifty far-field microphone array like a real Amazon Echo. Could also be used to create complex voice macros more easily than with IFTTT. I’ve had this project on my todo list for a long time, and now Seeed is handing me a (presumably) complete solution for a reasonable price.

  7. With this, you can create what the Amazon Echo 𝘴𝘩𝘰𝘶𝘭𝘥 have been.

    • Put one in each area of your house.
    • Give each a unique wake phrase (“kitchen”, “living room”, “Rod’s room”, etc.) to avoid the problem you have when multiple Echoes are within micshot of each other (and you don’t want to have to choose among “Alexa”, “Amazon”, and “Echo” for your wake words. I tried that, and always forgot which room was which). Use a Seeed far-field mic array attachment in areas where the acoustics demand it.
    • Have them all connected back to a central RasPi Alexa server (https://github.com/amzn/alexa-avs-raspberry-pi/blob/master/README.md).
    • Have the server set up to transmit audio (iTunes, Prime Music, Google Play, Plex, whateva) to every room where you might want it.

    Now you can say things like “Kitchen, turn on the light” (rather than “Alexa, turn on the kitchen light”), “Kitchen, play music here and living room” (synchronized! The Echo can’t do that), “Living room, stop music here” (leaving it on anywhere else it’s playing), etc. And, with a little more programming work, “Kitchen, pause audiobook and transfer to car” —> “Car, continue audiobook”.

    Don’t you want this? I sure as hell do! The Echo is one of my favorite toys of all time despite some annoying limitations, and this extension would allow you to fix all those limitations plus add pretty much any feature you can dream up.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s