Get Started With Speech Recognition

Headset and microphone

Speech recognition makes it easier for us to be lazy with our devices – or perhaps set up the coolest voice-controlled project around. After the voice controlled home automation post, we received a lot of emails asking “how can I make it recognize my voice?”. Whether your project involves a PC or an Android phone, a high-budget, or no budget at all, there is a solution out there.  Join us after the break for a complete set of instructions on setting up speech recognition, and some of the best software options out there to meet your needs.

Got a Microphone?

Using a microphone is the only way to get your voice commands to the computer for interpretation. If you’ve got a laptop, you’re probably set to go, as most laptops come with microphones already built in. Not sure? Look for a small hole around the screen or keyboard. It may be labeled, but not always. You can also try checking the list of features in your computer’s manual, or head to your control panel and select “Sound”. In this pop-up window, you’ll find a tab titled “Recording”. If you’ve got a mic installed, it will be listed here.

A built in mic

If you’re using a desktop, you’ll likely have to buy an external microphone. Many webcams include a built-in mic – check the package to make sure. Some newer media keyboards also include a microphone. If this is the case for you, you may have to reposition your keyboard out of confined space to reduce echo. If you’re a PC or Xbox 360 gamer, you might have a headset used to communicate with other players live. This can double as a mic for voice recognition. Don’t have any of these? Head to your nearest store which sells computer accessories – try Best Buy, Future Shop, RadioShack, or your favorite locally-owned retailer. A basic, usable microphone can range from a few dollars to hundreds of dollars. While a six hundred dollar microphone is unnecessary unless you plan to record a studio album with your computer, it might be a good idea to stay away from the cheapest of the cheap – these can often have a choppy and uneven sound of which your computer can not interpret. Generally a headset mic (or gaming headset) is the best way to go, as it sits close to your mouth for minimal interference. Make sure the mic you choose is compatible with your computer’s operating system and has an input your computer uses, and buy away!

A headset microphone

Flickr: [Yoppy] [Link]

Set Up Your Voice Recognition Software

Window 7 or Windows Vista

Voice recognition on either of these operating systems is as easy as a few clicks. With these operating systems, voice commands are thorough and simplistic, allowing you to control everything from form navigation, menu navigation, Office programs, and more. For almost anything you need to do, there is a voice command. To get started, head to the control panel and select “Speech Recognition”.

Control Panel in Windows 7

From here, you can test your microphone, train your computer to understand your individual style of speech, or view and print a reference card containing the commands your computer will understand.

Speech Recognition Dashboard in Windows 7

You can also take a tutorial which teaches you the ins and outs of speech recognition in one simple lesson. Select the “Start Speech Recognition” option when you’re ready to get started. This leads you through optimizing your computer’s sound input with positioning tips and speech tests, and guides you through the rest of the configuration in a very user-friendly manner. When you finish the wizard, you’ll be ready to go!

Speech Recognition Wizard in Windows 7

You can refer back to the speech recognition reference card as often as you need to review the commands your computer will understand.

Speech Recognition Reference Card in Windows 7

Windows XP

Voice recognition in XP is as easy to set up as it is with the newer Windows operating systems, however, it lacks the vast array of features that Vista and 7 offer. Speech recognition is supported by all Microsoft Office programs, however, only 2002 and 2003 versions are supported. With a version earlier that 2002, or with 2007 or 2010 versions in XP, you’re out of luck, as built in speech recognition is not supported. Otherwise, basic commands are not always supported, and speech recognition cannot be used with all programs. In general, you will have to enable speech recognition specifically for each program with which you wish to use it, and it will not be available in all programs.

Windows XP uses a speech recognition engine which comes with Office XP, though is not always installed by default. Open control panel, and from classic display, select the “speech” option. If you’re using the newer, categorical menu in XP, you’ll have to first select the “Sounds, speech, and audio devices” option.

Speech Recognition Icon in Windows XP

Youtube: [mickmoose429992] [Link]

If you see a “speech recognition” tab in “speech properties”, you’re ready to go, as the engine has already been installed.

Speech Properties in Windows XP

Youtube: [mickmoose429992] [Link]

If this option is missing, you’ll need to install it. From the control panel, select the “add or remove programs” option.

Add or Remove Programs in Windows XP

Youtube: [mickmoose429992] [Link]

Find Microsoft Office XP, and select the “change” option. Be careful not to uninstall!

Change Microsoft Office XP

Youtube: [mickmoose429992] [Link]

Find “features to install”, select the “alternative user input” option, followed by the “speech” option. Select “run from my computer” and click update. This automatically includes speech recognition in all Office programs, and makes the feature available to other programs.

Add Speech to Microsoft Office XPYoutube: [mickmoose429992] [Link]

Mac OS X

Apple was one of the first to come out with speech recognition – a crazy idea at the time. This was back in 1993. We’ve come a long way since then, from more fluid, user-friendly controls to the ability to perform almost any action without ever touching your keyboard. Setting up speech recognition in OS X is a breeze. Once you’ve got you mic ready, select “system preferences” from the Apple drop-down menu. From this menu, select the “speech” option.

Mac OS X Speech Feature

Youtube: [fifedjdomo] [Link]

Enabling “Speakable Items” will turn on the default commands, allowing you to perform most basic tasks.

Mac Speakable Items

Youtube: [fifedjdomo] [Link]

Through available options, you can set up your microphone and further customize the use of the program. The set of commands used to control your computer is fully customizable. Pair this with VoiceOver, a program designed for the blind, and you’ll hardly need to touch your computer in order to use it.

Linux Ubuntu

Linux does not currently have a complete solution for speech recognition. Though several projects have been started, none have been finished. There are several pieces of software that can perform some of the speech recognition tasks that Windows or Mac can accomplish, but nowhere near as thoroughly or easily. There is also no proprietary software for speech recognition with Linux, however, there are some partially-completed open source solutions for Ubuntu. Julius Speech Recognition engine is one of these utilities – a program used to interpret and execute a set of pre-determined voice commands. Detailed instructions for installation can be found [here].

Julius Main PageYoutube: [jgraves1141] [Link]

Documentation on the installation and use of Julius is very limited due to the fact that the program is not completely finished, so you may not want to attempt an install unless you are completely comfortable with the use of Linux. The Julius package available for download contains two parts – an installer, and the program. First run the installer which will take you through the installation of Julius.

Another great solution is to use a Windows-based program such as Dragon NatuallySpeaking in combination WineHQ, however, there are lapses in fluidity that often have to be worked-around. For example, in some cases, a basic paragraph must be narrated to Dragon’s text editor and then copy-pasted into the appropriate location rather than transcribed directly to the appropriate program.

WineHQWineHQ: [Link]


Top Third Party Proprietary Software

If you have an older operating system, or simply don’t like the speech recognition software included with your operating system, a third party program may be what you need. There are dozens of free and paid speech recognitions out there, customizable, non-customizable, open source, for business, for personal use, and more. With so many options, you’re sure to find exactly what you want at a cost you can afford. Some of the most popular:

Dragon NaturallySpeaking

Dragon is a name that pops up over and over when searching for speech recognition software. Made for PC, it’s highly regarded for its speed, accuracy, ease of use, and large number of commands. The basic version of Dragon for home use is around $100 US, though many versions are available with more specific features, such as packages for medical or legal offices. These packages can cost over a $1000 US, though are unnecessary for the basic user. Dragon NaturallySpeaking software packages also include a mic, so you won’t have to try and find your own.  In addition to Windows, many users have claimed great success with Dragon in combination with WineHQ for Ubuntu.

Dragon NaturallySpeakingDragon NaturallySpeaking: [Link]

 

MacSpeech Dictate

MacSpeech is produced by the same makers as Dragon NaturallySpeaking. It was built from the ground up, rather than being ported, so it is free of the bugs that typically come with adapted software. Similar to Dragon, MacSpeech offers not only dictation recognition, but customizable speech commands as well, and includes a mic in the package. Also following the Dragon theme, medical and legal versions are available, as well as an international edition which supports Italian, French, and German in addition to English. These speech recognition tools for Mac range from $150 US to $600 US.

MacSpeech DictateMacSpeech Dictate: [Link]

IBM ViaVoice

IBM’s ViaVoice recognition software is designed primarily for use with small mobile devices and vehicle automation systems, though it’s quite highly regarded amongst computer users as well. ViaVoice offers text-to-speech in addition to voice recognition. The command library is intuitive, and the user does not need to stick to a standard set of commands to make use of all the features – the program can interpret most commands as they are given. The speech library contains over 200 thousand words; far more than the average person’s vocabulary. Supported by IBM ViaVoice, in addition to many mobile OS’s, are standard Windows and Mac operating systems.

IBM ViaVoice

Third Party Open Source and Free Software

Open source or free voice recognition software that works well is extremely difficult to find – there is really no winner in the open source race for free voice software. In fact, there is hardly a race at all. Numerous open source Linux projects have been started, but due to the extreme scale, none have been finished. Below is a project you can contribute to in order to get the ball rolling on some great open source speech recognition software, as well as a toolkit for your own uses.

VoxForge

VoxForge is a project working to compile a collection of transcribed speech for use with both open source and free voice recognition engines. Upon the completion of this project, free open source speech recognition programs should be given the jumpstart to increase significantly. If you’d like to help the project, you can visit the VoxForge website [here].

VoxForge

VoxForge: [link]

CMUSphinx

Sphinx is now on version 4 (Sphinx 4). Perhaps the most (or only) popular open source speech recognition tool, Sphinx is licensed under BSD and is written in Java. Sphinx also offers a mobile version called “PocketSphinx”. This may be more useful for developers than the average user, but it’s one of the only solutions available, not to mention a versatile and thorough one. It does not come ready to go out of the box, but rather is a tool that can be utilized by developers. It certainly needs some work before it’s ready to go.

CMUSphinxCMUSphinx: [Link]


How to Install CMUSphinx

Setting up CMUSphinx is not the easiest task, but it is likely to pay off with a great product. This install needs to be done manually.

Before you get started, you’ll need a few things – Perl, in order to run the scripts, and a C complier for the source code. Perl is free, and included with most Linux distributions. GCC (GNU Compiler Collection) is a good tool for the C portion of the source code. A word alignment program is also necessary – CMU suggests “Sclite”, a tool specifically used for speech recognition programs.

The databases you will need are available [here] . You’ll need either AN4 or RM1. Next, you’ll need to set up the trainer. A trainer helps your computer interpret your commands. Set up the tutorial – this will include copying the scripts to the proper area. The decoder is next. Though you can pick any decoder you choose, CMU describes the installation with Sphinx 3, and encourages you to perform your testing with Sphinx 3. Once you have all of the appropriate files in the correct directory, it’s time to compile, and set up the tutorial. Perform a training run, and finally, perform a decode. This set-up is extremely complicated, as is likely best left to the professionals – certainly not something for most average users.

Full instructions can be found on the Carnegie Mellon University’s Sphinx website [here].

This demo shows Sphinx in action:

[youtube=http://www.youtube.com/watch?v=owJS5XwXAEA]

You’re Ready To Go!

Once you’ve got your mic functional and in-place, as your speech software set up and configured, you’ll be ready to get started! Sit back and get talkin’!

52 thoughts on “Get Started With Speech Recognition

  1. Though noone has asked, if you are wanting TTS stuff I have looked and looked, and the cheapest best sounding voices for windows/linux I found are at http://cepstral.com/

    I’ve been trying to tie all this together with misterhouse, pocketsphinx, PIR sensors, etc to make a smart home, at some point I need to actually tie all the random stuff I have together :P

    Pocketsphinx is really easy to use, and for small word libraries, I’ve gotten 100% accuracy (for like 20-40 words). I was rather impressed. For home automation stuff I would definitely recommend a small dictionary that you make. I dunno if its possible, but the nicest option, in my opinion, would be if you could switch out dictionaries at run time so you can have context specific word lists to keep the accuracy near 100%

  2. Lest I be construed as some kind of hater, I’d like to point out that ‘my’ comment above wasn’t actually by me, but by some other dumbass I had the displeasure of dealing with on IRC.

    Personally, I’m not of the opinion that HaD is best served by this sort of article, but in and of itself it’s decent.

  3. At first I thought wow, as I imagine’d it would go down to a technical level: code with wavelets and decoding speech sync pulses, etc.
    Regardless its informative and I’m sure it’ll be useful to some form of projects that can utilise voice commands to drive a system.

  4. Great article. I use the (somewhat limited) speech recognition that comes with windows 7. Nothing beats screaming next song at the computed when it somehow manages to find something embarrassing. And of course then it won’t listen since the timber of my voice changed because I was flustered. But anyways I find it really useful with my old “laptop” which is missing its screen, keyboard and various other bits.

  5. (h8ter moment) Using commercial products for its intended function is not something we are used to seeing on HAD, but there is some good info in there and its a good first article… and you will get the vibe soon enough

  6. @ajelo: That recognized poem is no different than the way most teenagers speak and write today :P

    –corrected brain malfunction: too hot in office. Sorry for double post.

  7. I thought software-related posts meant *creating* software. You know, code. When I saw speech recognition I thought it would touch on some lisp or prolog AI programming, or possibly some throwback to Ray Kurzweil. Not a tutorial on installing win/mac/ubuntu software and how to select the proper mic.

    “Make sure the mic you choose is compatible with your computer’s operating system and has an input your computer uses, and buy away!”

    I thought 1/8″ plug or usb is pretty universal. But all in all, a good first writeup that my grandma would need when shes wants to use the ~internet and cant find the blue E.

  8. a female had writer. of course. if maybe this was about ripping a toy apart with voice commands and making it do something kewl, then a had article it should be. i dont think HaD readers want simple howtos. isn,t that what had answers is for? this should of been faq not a hack.

  9. Jesus Effing Christ, people. Some of us actually find this sort of thing helpful. We’re not all high-end coders, but still like to work on cool projects. Please stop with all the butthurt.

    It seems like every damned post now is followed by a string of whiners bitching about how the topic isn’t up to HAD’s standards. It isn’t the posts that are ruining the site; it’s you.

  10. “It isn’t the posts that are ruining the site; it’s you.”
    So we tell that we want something useful something we can implement in our projects, most people including me expected to see some CODE after reading article name. Its not whining and beaching when you refuse useless crap, its like CS professor decide to show popular irc software instead of teaching how to make one, unexpected and useless

  11. @wes, Maybe some comments are too harsh. But unless they start tutorialsaday.com ( oh wait there is a http://tutorialaday.com/ ) Okay i forgot my point already…

    Seriously though. I thought the HaD answers site was for this kind of information. This article should have been posted in a thread and sticked for someone to read and add thoughtfull comments too.

    Not that it wasn’t informational. It even covered windows, mac and linux. Only thing missing was embedded design. it’s just not hacking material.

    Maybe i should put my hacked together lpt port 2-way radio controller slash speech recognition system on to a website and send to HaD. http://farm5.static.flickr.com/4046/4580165563_1c16214c7d.jpg

  12. Dragon Naturally Speaking is a great voice recognition software. If you need an office suite other than MS Office that is completely compatible with DNS, then I would suggest SSuite Office’s free Excalibur office suite. {http://www.ssuitesoft.com/ssuiteexcalibur.htm
    }

    It works perfect with Dragon naturally speaking since the office suite was written in native WIN32 language. For those who don’t know what WIN32 is, it is the base code of Windows itself.

    Great software too. :)

    1. the speech engine is the registry try running the computer fully fragmented red across the board save all temp files using your user name defragment every time you train the speech trainer after a long time you can use defrag reports as a template then you can slowly make a map of the registry witch happens to be the default ground (motherboard) but you have to open every single file on the desktop witch makes it almost frezz and it has to be done all off line no service packs here people you can update your computer using the speech engine with your voice you can make it reverb through the motherboard and as it does this it completely inverts the whole program into a sound wave format instead of code…..

  13. Thanks for the opinions, everyone. I’m reading everyone’s feedback and taking everything into consideration. Clearly I can’t please everyone – but that doesn’t mean I can’t try.

    @hettrick (RE: “I thought 1/8″ plug or usb is pretty universal”) – my speech recognition enabled phone actually uses ExtUSB for which I have to buy headsets out of Europe – so no, they’re not completely universal.

  14. the next part of that statement talks about usb, so yea there is really only 2 ways to get audio in, though the audio jack or a usb converter

    dont quote half the post to make yourself look right, your not talking to a dumb crowd

  15. One problem with using the commercial programs like dragon is getting the input from the speech out to a device. I thought of doing a home system just for tv remote control but the issue become how do you get the application that does the voice processing. I have dragon, to output what you need to the external micro. I tried opening a serial window and having dragon enter what I said but its problematic because you cant use a simple command like tv 5 .
    You have to say tv 5 enter which makes its longer and odd to use. Looking for a way to take commands and be able to script them directly.

  16. @cgmark, i just spent an hour trying to find what i used and i cant find it.

    Looks like dragon v10 has scripting options. Maybe you could use that to take simple commands to complicated tasks?

    What i found years ago was a tool used for carputer designs. That could take a voice command and run a series of command line calls or scripts. So i wrote an app that you could send arguments to. So you could say tv 5 and the app could send “programname 5” to change the channel. I even think the app i had needed a start word before it would do anything. i used “computer” then i would say the comand. not sure if dragon has the same type of features. But if there is scripting then im sure you could write anything you needed into it.

  17. With Google working on so many projects involving their own speech recognition engine, and with voice recognition becoming available for Android, we can anticipate that Google will eventually provide an expanding family of voice recognition products for popular environments where it benefits Google – which means Linux and Linux-derived OSs.

    1. the only way a speech engine will ever be practical and work is if it can be true all the time no matter what is if the potential has no limit and it can be true no matter what.to set a limit in a environment limiting its self in relations to speech is its potential witch means it can only expand within its limits and the freedom of speech has no limit

  18. using the speech recognition i can completely invert the os and unmoveable files to my desktop and its (lightspeed fast)… i can go anywere in my system instead of the os revalving around the hdd my hdd revolves around the os i can create a logic loop anywere to recover the os no matter how retarded the computer gets i can recover the os no matter how far it gets pulled apart on the hdd no matter how long the logic loop gets spread out i can speak the language verbally to bring it back….

  19. i guess thats how windows makes money by stealing your temperary files,these write the servicepacks that they make you buy, they get your os running back up to speed….How would you like to have a computer thats guna run the same way you bought it, no matter how much information is on it

  20. Hi, currently I am actually dong a project regarding quadcopter and I have this idea of commanding my quadcopter to do something by voice. Then I manage to find this page and I would like to know whether the recommended software above are able to command my quadcopter to do task? Very sorry if my English is terrible.

  21. If I told you that apple, android and windows had no part in writing the program we all know today as (speech recognition), what would you think of that?

    1. Video os feedback is needed first before voice feedback live here as voice typing is a must for voice os commands and for doing work with it. did you ever use voice typing to type in comments online? i have and its a mess!

  22. enables feedback live options for voice typing and for os voice commands ai deep learning and machine learning capture to main servers on os makers for os voice commands and for app designers to implement if the os makers reject supporting it natively by brute forcing it.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.