Seeed Studio’s ReSpeaker Speaks All The Voice Recognition Languages

August 25, 2016

Seeed Studio recently launched its third Kickstarter campaign: ReSpeaker, an open hardware voice interface. After their previous Kickstarted IoT hardware, such as the RePhone, mostly focused on connectivity, the electronics manufacturer from Shenzhen now tackles another highly contested area of IoT: Voice recognition.

The ReSpeaker Core is a capable development board based on Mediatek’s MT7688 WiFi module and runs OpenWrt. Onboard is a WM8960 stereo audio codec with integrated 1W speaker/headphone driver, a microphone, an ATMega32U4 coprocessor, 12 addressable RGB LEDs and 8 touch sensors. There are also two expansion headers with GPIOs, I2S, I2C, analog audio and USB 2.0 and an onboard microSD card slot.

The latter is especially useful to feed the ReSpeaker’s integrated speech recognition engine PocketSphinx with a vocabulary and audio file library, enabling it to respond to keywords and commands even when it’s not hooked up to the internet. Once it’s online, ReSpeaker also supports most of the available cloud based cognitive speech recognition services, such as Microsoft Cognitive Service, Amazon Alexa Voice Service, Google Speech API, Wit.ai and Houndify. It also comes with an SDK and Python API, supports JavaScript, Lua and C/C++, and it looks like the coprocessor features an Arduino-compatible bootloader.

The expansion header accepts shield-like hardware add-ons. Some of them are also available through the campaign. The most important one is the circular, far-field microphone array. Based on 7 XVSM-2000 digital microphones, the extension board enhances the device’s hearing with sound localization, beam forming, reverb and noise suppression. A Grove extension board connects the ReSpeaker to the Seeed’s current lineup on ready-to-use sensors, actuators and other peripherals.

Seeed also cooperates with the Meow King Audio Electronic Company to develop a nice tower-shaped enclosure with built-in speaker, 5W amplifier and battery. As a portable speaker, the Meow King Drive Unit (shown on the right) certainly doesn’t knock your socks off, but it practically turns the ReSpeaker into an open source version of the Amazon Echo — including the ability to run offline instead of piping everything you say to Big Brother.

According to Seeed, the freshly baked hardware will ship to backers in November 2016, and they do have a track-record of on-schedule shipped Kickstarter rewards. At the time of writing, some of the Crazy Early Birds are still available for $39. Enjoy the campaign video below and let us know what you think of think hardware in the comments!

22 thoughts on “Seeed Studio’s ReSpeaker Speaks All The Voice Recognition Languages”

Marcus says:

August 25, 2016 at 8:46 am

Sorry, a 575 MHz MIPS SoC[1] that can physically only address 128MB/256MB RAM is clearly not the device I’d do signal processing on – Yes, I guess, pocketsphinx works, but I honestly think you’d be better of with about any Raspberry Pi.

The combination with the DSP chip that handles the whole mic array processing / audio quality improvement is interesting, though – maybe as a USB device to a proper processing platform, this would be great fun :)

So, now let the comment trolls take over and explain to you why letting Google/apple/amazon/LG/sony listen in on your living room might be a bad idea, and only the offline-capability is interesting (by the way, I fully agree with that. I just don’t prefer to hate, even on things that read a lot like an advert for a product of a grown company that could just do stuff without crowd funding).

[1] https://labs.mediatek.com/fileMedia/download/9ef51e98-49b1-489a-b27e-391bac9f7bf3

Report comment

Reply
1. Marcus says:
  
  August 25, 2016 at 8:48 am
  
  ah, by the way, my point being: From the makers of PocketSphinx: Nope, on something as weak as a Pi (and this is weaker!), you can’t have a large vocabulary. http://cmusphinx.sourceforge.net/wiki/faq#qcan_i_run_large_vocabulary_speech_recognition_on_mobile_device_raspberry_pi
  
  Report comment
  
  Reply
  1. Moritz Walter says:
    
    August 25, 2016 at 10:06 am
    
    That’s an interesting objection, but what deep matters do you want to converse about with that devboard that would require more than a few keywords?
    
    Report comment
    
    Reply
    1. Marcus says:
      
      August 25, 2016 at 11:22 pm
      
      Shopping lists would be among those. And you know that no-one needs a shopping list to bring back milk from the supermarket, but to remember the less obvious things.
      
      Things like reminders would be the next thing. Sure, “Meeting with Moritz, tomorrow 5pm” isn’t something that’d need big vocab, but “Meeting with Marcus to discuss complexity relation to size of vocabulary in speech recognition” would surely do – and being able to enrich my memos with actually useful info would be how this would be an improvement over a pocket calendar made out of dead trees.
      
      Report comment
      
      Reply
      1. Moritz Walter says:
        
        August 28, 2016 at 11:13 pm
        
        It’s true, the hardware is extremely limited. Although I think PocketSphinx is more a little offline tool on the ReSpeak. You’ll probably want to keep your shopping list and calendar appointments synced across your devices, one of the cloud-based speech services is probably the way to go for these.
        
        Report comment
2. Gravis says:
  
  August 25, 2016 at 12:15 pm
  
  He’s right, you are likely to be restricted to 100 words or so if you want it to be near real-time recognition. It would be interesting to see what you could manage using an FPGA.
  
  Report comment
  
  Reply
  1. Marcus says:
    
    August 25, 2016 at 11:17 pm
    
    Better-than-pocketsphinx big-vocabulary speech recognition via FPGA? You could probably manage to get a couple million Euro in research funding, and a lot of international scientific recognition for that. That, and your 10 PhD students could get their PhD.
    
    Just a guess. FPGAs aren’t some magic boxes that you throw at problems. What you’re essentially saying is “let’s implement a very complex digital thing in hardware, I heard that works much better than software” and ignoring the fact that implementation alone is extremely complex.
    
    Report comment
    
    Reply
John Lightner says:

August 25, 2016 at 9:11 am

One step away from a universal translator.

Report comment

Reply
TacticalNinja says:

August 25, 2016 at 9:31 am

I like the microphone array, and how it detects where speech is coming from. I think the next best thing is using the other microphones opposite, and perpendicular to, the active microphone as noise cancellation.

Report comment

Reply
1. Marcus says:
  
  August 25, 2016 at 11:17 pm
  
  You’re missing the point. The add-on mic array board has a processor that does *exactly* that kind of thing.
  
  Report comment
  
  Reply
CRJEEA says:

August 25, 2016 at 9:53 am

Offline speech recognition is a big plus in my book.
I did a nice set up a couple of years back. A dozen boards each capable of finding a keyword very selectively and then looking at a short list of commands. Having them all listen at the same time and compare notes, produced quite a functional system.

Report comment

Reply
1. Indyaner says:
  
  August 25, 2016 at 2:48 pm
  
  You should drop some module names then.What did you find out?
  
  Report comment
  
  Reply
Dan#8582394734 says:

August 25, 2016 at 3:28 pm

BTW, you can use the online speech recognition engines to train your offline engine. :-) Muahahahahaaaaa!

That is the price “Big Cloud” pay for gathering data on your entire life, you get to use their infrastructure to train your own AI systems and when it is accurate enough you cut the link to the cloud permanently.

Report comment

Reply
1. Marcus says:
  
  August 25, 2016 at 11:18 pm
  
  heh. That’s a clever approach :)
  
  Report comment
  
  Reply
2. Enthusiast says:
  
  September 9, 2016 at 8:07 am
  
  Any link or guide to doing this sort of thing?
  
  Report comment
  
  Reply
echodelta says:

August 25, 2016 at 11:39 pm

Please do not forget how already parsed words are so much easier for whatever is on the other end to do intelligence on you while you are not actively using the interface. A bug is one thing, but a stenographer sending copy to whomever is wanting it is a whole different ballgame.

Report comment

Reply
Dave Davidson says:

August 25, 2016 at 11:41 pm

Can you train offline using a PC ? is there be of software that you can loud there plangins ?

Report comment

Reply
Rodney McKay says:

August 26, 2016 at 8:22 am

Ordered one. Plan to use it as a frontend for a RasPi Alexa device to let me have voice activation (with a customized wake word!), tailor the syntax a bit (to get around the clunky syntax needed to activate many third-party “skills”), and have a nifty far-field microphone array like a real Amazon Echo. Could also be used to create complex voice macros more easily than with IFTTT. I’ve had this project on my todo list for a long time, and now Seeed is handing me a (presumably) complete solution for a reasonable price.

Report comment

Reply
Rodney McKay says:

August 26, 2016 at 2:00 pm

With this, you can create what the Amazon Echo ???????????????????????? have been.

• Put one in each area of your house.
• Give each a unique wake phrase (“kitchen”, “living room”, “Rod’s room”, etc.) to avoid the problem you have when multiple Echoes are within micshot of each other (and you don’t want to have to choose among “Alexa”, “Amazon”, and “Echo” for your wake words. I tried that, and always forgot which room was which). Use a Seeed far-field mic array attachment in areas where the acoustics demand it.
• Have them all connected back to a central RasPi Alexa server (https://github.com/amzn/alexa-avs-raspberry-pi/blob/master/README.md).
• Have the server set up to transmit audio (iTunes, Prime Music, Google Play, Plex, whateva) to every room where you might want it.

Now you can say things like “Kitchen, turn on the light” (rather than “Alexa, turn on the kitchen light”), “Kitchen, play music here and living room” (synchronized! The Echo can’t do that), “Living room, stop music here” (leaving it on anywhere else it’s playing), etc. And, with a little more programming work, “Kitchen, pause audiobook and transfer to car” —> “Car, continue audiobook”.

Don’t you want this? I sure as hell do! The Echo is one of my favorite toys of all time despite some annoying limitations, and this extension would allow you to fix all those limitations plus add pretty much any feature you can dream up.

Report comment

Reply
Em Hoang says:

December 14, 2016 at 4:26 pm

This product doesn’t have much documentations available online. Reply from Owners/Developer are very rare. I had purchased ReSpeaker MicArray; however, its driver (Serial.inf) is not working on Windows 8+. Python files are not running on Linux. Theirs files are not welly tested. I’m still looking for a step by step how to use ReSpeaker MicArray on Linux /Windows. I’ll give it a week before I return it back to their store.

Report comment

Reply
Jay says:

July 18, 2017 at 2:35 pm

I received one recently and it’s software is far from being usable. Wifi creates a security nightmare, python packages can’t be updated, issues on github aren’t solved.
Save your money for something worth it.

Report comment

Reply
Akshay says:

November 29, 2017 at 9:24 pm

Does Respeaker Mic Array work with Windows 10 IoT Core on the Raspberry Pi?

Report comment

Reply