What if you could add gesture recognition to your computer without making any hardware changes? This research project seeks to use computer microphone and speakers to recognize hand gestures. Audio is played over the speakers, with the input from the microphone processed to detect Doppler shift. In this way it can detect your hand movements (or movement of any object that reflects sound).
The sound output is in a range of 22-80 kHz which is not audible to our ears. It does make us wonder if widespread use of this will drive the pet population crazy, or reroute migration paths of wildlife, but that’s research for another day. The system can even be used while audible sounds are also being played, so you don’t lose the ability to listen to music or watch video.
The screen above shows the raw output of the application. But in the video after the break you can see some possible uses. It works for scrolling pages, double-clicking (or double-tapping as it were), and there’s a function that detects the user walking away from the computer and locks the screen automatically.
[Sidhant Gupta] is the researcher who put the video together. In addition to this project (called SoundWave) he’s got several other interesting alternative-input projects on his research page.
40 thoughts on “Doppler-effect Lets You Add Gestures To Your Computer”
2:37 good way to look crazy in public.
I wonder if you could use small pulses of high frequency sound very fast and see the return time on the microphone, that way you could no the precise(kinda) distance and the differences in return time would indicate speed.
That is possible. One of the papers I cite in the paper does activity/context recognition using only reflected pulses.
Very cool, but per the PDF, the tones generated are in the 18-22 kHz range, not 22-80 kHz as stated. Consumer sound hardware is generally built to top out around 18-20 kHz since this is the basic audible spectrum.
I’m surprised this works at all on consumer hardware, since for most low-cost speakers, the response at even 18 kHz is likely to be super uneven, let alone at higher frequencies.
Good point, I had similar concerns about it not working on crappy hardware that performs not so well above 16K, however, even a low SNR is good enough to extract doppler shift information.
To be scientifically sound, I tried it on a bunch of laptops dating back to 2005.
Also noted in the PDF is the fact that 18 kHz tones are not automatically inaudible to humans. The younger you are, the more likely your hearing actually reaches close to 20 kHz.
lol “close to 20″…. lmao
this is not a troll, it is my opingion stated as a fact:
9/10 people ive ever met can NOT hear the 15.75khz of the FLYBACK of tube tv sets!!!
not one student in a(all my previous) classroom(s) is bothered by leaving a tv on that has a slowly dying flyback/flyback-driver, cept me.
the reason you think you do hear 18khz is because a treble adjustment for 18 khz on a stereo actually affects the wave shape of slower/lowerfreq signals, try it on a o-scope.
EVERY FREAKING ONE of those ******-dam tube tv sets makes the ********* 15khz sound. it is a very loud sound, and it can be heard from 10 or 20 meters away! (32 or 64 feet)
dont you tell ME people can hear up to 20khz, that is for dogs and bats and cats (ect).
the ONLY tube sets that do NOT make that sound are the ones that DOUBLE-SCAN the HORIZONTAL at exactly 31.5khz,,, like an EGA/VGA tube set or just a high-def tube tv that always runs its raster at the double freq, and doubles the video signal, == cheaper raster components.
(so it plays two freq video but only one freq go into FET/BJT)
like i said, 9/10 PEOPLE can only hear up to 14khz or 15khz. ive met about 10 people in my life that could hear the 15.75khz from 10 or 20 meters away and only ONE person that could also hear UP TO ~17khz like me.
ive never met ANYone that would be disturbed during a school exam from 18 or more khz.
(being blasted at several 100’s of wats, like in a 32 inch tube tv flyback)
people do NOT hear “up to 18 or 20 khz”
they hear UP TO 14 or 15 khz.
PS: PAL tv system are a bit lower at 15.65khz but still sounds about the same and will work in the short term on your (NTSC) tv without blowing the horizonal transistor right away. the vertical is another story (60 hz vs 50 hz) but still wont damage it in the short term.
PPS: and yes it is possible to display PAL on a NTSC tv (without color) if you know what to adjust, the picture will be a bit funky.
and tv raster circuits wont last long.
I’m going to call bullshit on your “people can’t hear 18-20khz”, because you are just incorrect.
Last time I had my SMAART measurement rig setup in a controlled environment there was a moving-light ballast making an obnoxious noise at around 16-17k. It was very easy to hear. I can definitely hear higher frequencies than that.
I suspect many younger people can hear noises that high but aren’t used to listening for them, or those noises typically aren’t very loud in comparison to the sounds that they are used to listening to (e.g., the human voice).
Also, I’m relatively sure that typical hearing loss begins in the 1k-5k range before the 18-20k range. (it makes sense- when you’re having trouble understanding what somebody says, it’s probably not because of loss at 18-20k… it’s probably because of loss in the middle of the range that’s important for intelligibility of the human voice.)
Source: I’ve been a pro-audio guy for about 13 years.
More on topic, most consumer-grade microphones are barely going to pickup that 18k-20k audio reliably– especially the crap typically used at computers. Typical sound cards will receive that input just fine, though.
I’m interested how they’ve compensated for crappy hardware.
you never mentioned how you can hear the tube tv’s horizontal… can you?
maybe, but it’s not common in my opingion.
ask some people, they will look puzled when you ask them if they can (tube tv horizontal) and then when you show them they will think you cheated with mirrors! … they will ask u to do it blindfolded…
CAN you hear the television’s horizontal ??????
BTW i agree with ADULT hearning loss being 1k-5k…
i was talking about harmonics from a shrieking baby,,,
non-stop screeeching at the top of lungs, 3 inches away from ears (WSIB:unsafe), for 2 hours straight(again,WSIB:unsafe), 12 times a day (AGAIN,WSIB:unsafe!) 7d/w 4w/m 12m/y for a whole year,
as MOST babies do this and pre-conditions (aka deafens) their ears for “normalcy” later in life.
either that or they have trouble sleeping bcuz the sound of anything and everything is louder then everyone else’s perception of “too loud for sleep”
(furnace, wind, rain, cars, ect)
im talking about TUBE TELEVISION sets running at the standard NTSC or PAL of ~15khz
what if i walked into your “controlled enviroment” and started blasting heavy metal music, could you still hear that annoying high-pitched sound overtop of the music? and from 10 meters away down a hall and around the corner, with said music?
i doubt it
or what about a room full of talking people, say about 30 people :)
after all, you mentioned “controlled enviroment”
aka its nice and quiet.
maybe you werent trolling, but for next time, trolls use it and love it when other ppl use that word :) … it gives them bait
also you used the word “suspect” as evidence for something, suspicions are not creditable evidence, otherwise i could be thrown into jail for suspicion of anything by anyone, thats why there are criminal investigations and undercover police.
using the built-in analouge tuner in a tv that does NOT completely cut off ALL signals including SYNC from the tuner DURING channel changing,,, i can tell when you change channel even overtop of 30 ppl talking and from down the hall
PS: i agree that most consumer mic’s cant pickup very well in the upper range, some dont pickup at all above moderate frequencys, even ones that EVERYONE can hear
Can we agree that on Hack A Day we’re at least nominally interested in facts?
Opinions are one thing, but when you say “people don’t hear up to 20 kHz PERIOD” you’re simply contradicting the facts. And with what? Anecdotes.
20 Hz – 20 kHz is the widely accepted figure; see http://hypertextbook.com/facts/2003/ChrisDAmbrose.shtml for references.
You and I may not know anyone who can actually hear up to 20 kHz, but it doesn’t make it untrue. I don’t know anyone with a 4′ vertical jump, but that doesn’t mean I’m going to tell people “LeBron James is a liar PERIOD”. Crikey.
WALL OF TEXT of just one thing.
I get your rant but for me that high pitch “beeeeeeeeeeeee….” sound from TVs is very clear. You can listen to some caps that make that same sound. I think Splinter Cell made that sound famous!
Yikes. Yes, I can hear CRT tubes. Quite well in most cases.
In a room while heavy metal music is playing? Probably not. But that has nothing to do with whether I can actually hear it or not.
As for being interested in facts, I do like facts. And 20hz-20khz *is* the accepted normal range of human hearing.
Here’s another link indicating that: http://en.wikipedia.org/wiki/Hearing_range
This ~20khz max is also why audio CDs sample at 44.1khz (so they can carry approximately 22khz signals. Go read about Nyquist.)
See also this chart comparing approximate filtering necessary to make all frequencies sound the same to the human ear at various SPL: http://en.wikipedia.org/wiki/File:Lindos1.svg
You’ll note that 10khz+ needs at least a 10-15db (8-32x) boost to seem equally as loud as 1k-2k. This is a great example of the reason why I think many people don’t even realize they can hear those frequencies– because it is much quieter than what they primarily listen to. The average person really is not very observant about sound, but that doesn’t mean they can’t hear.
I work with multiple individuals who have their hearing tested yearly. They are 50+ years of age and still hear 20-20k according to the test results. So don’t say it’s not possible– that seems rather ignorant– especially for an individual implying that they work in education.
I will point out that even my not-necessarily-factual observations are based on my career as a sound engineer (including precise measurements on a regular basis). I’m pretty sure that should give some credibility to my comments, factual or otherwise.
Who’s the troll here?
I can totally hear up to 18khz, just tried it with audacity. 18.5khz is barely audible, and I can’t hear 19 at all really. I’m 20 years old, so yes people can hear this high up.
It’s entirely possible your speakers are just unable to reproduce those frequencies. This is often the case.
Great discussion. The consensus is right, few people if not most can hear frequencies at 18 KHz (or even up to 20 KHz). In fact, my adviser can hear it and since I did not believe him, we did a quick informal test where we randomly turned it on/off and asked him when he hears it. Conclusion was that he could :)
looks like it needs large gestures.
if im flipping through albums as in the example i dont want to move more than a finger (maybe two fingers)
it doesnt seem to recognize orientation or change in shape either so grabing or twisting (very natural gestures) would not be recognized.
Hey man I just want to ask this (in the nicest possible way) and no disrespect, but how often do you flip thru albums in daily life? I am an old guy dislikes the whole flick thru junk one by one (and almost like a slot machine in some cases I’ve seen-it keeps spinning lol)and prefer seeing everything at once, even if it means generic blah icons so I can instantly select it. Is this something the young folks are just used to or is it more pleasing to your user experience or is it just an easier standard across all of the devices you own? Again, not being disrespectful, so any input you have would be great :) I’ll just be over here adjusting the onion on my belt. Thanks :)
I’m excited about this project and think it is pretty brilliant :) This and cloud server processing will bring those old celeron laptops back out of retirement one day :)
I messed around with some of the old apps (prolly around 2003?) for people with disabilities to ride bikes and hear pictures (used it for granular synthesis lol) but did not think about it as a common interface. Kudos! Keep up the great work!
moar of this kinda stuff HaD :)
Absolutely right, and that is why we were careful to not claim it as a complete input system but rather complementary to other input techniques. The point here was to show that using commodity hardware, it is already possible to do quite a few gestures!
Nice job Sidhant! I am excited to see you and your team’s future progress :) I haven’t busted into the whitepaper yet, but it has had me thinking all day. I used to use a program called Mousing (it was win95 iirc)seems like you can still get it here: http://www.sagebrush.com/mousing.htm that let you use the mouse as a theremin. Sure it would be funny to control this and backwards emulate a theremin lol, but the way it did volume control and freq was what made me think about your project all day :) I guess with mousing you could also control midi devices too. Anyhoo, I know they are not related but it seemed like similar basal concepts. Check out mousing to have some fun around the lab :) You all have been busy :)
Haha.. absolutely. That would be fun! To control a theremin. I actually applied fundamentals of theremin to a different project called LightWave. You will find it interesting. Its on my website.
Glad you liked it!
This is really cool! The amount of gestures they’re able to detect with a speaker and a microphone is quite amazing…
Having said that, I don’t think it’s suited for interacting with a computer. The gestures needed are too large, and too imprecise for something like a computer. You could probably get better results with a VGA webcam and some image-recognition software.
This probably has better use for detecting people in proximity (like the “approach to unlock, walk away to lock” computer example), or in a vehicle to notify the driver of objects in their blind spots… or even helping the blind!
… Although now that I think about it, most of the examples I just mentioned have already been implemented in one way or another…
I wonder if you could establish a “sound profile” for people… recognize and differentiate people based on their doppler signature?
I tip my hat to you sir, the doppler profile idea is brilliant. I can think of tons of applications for that.
Why does Word delete everything and quit without saving whenever I reach for my soda?
Incorporate this into Windows 8, and you would have a winner for those that do not have a touch screen system.
Doesn’t Word do that anyway?
Nice, but I did this in 2009 already.
The article you linked to is about using Wiimote for gestures. This is different because there is no Wiimote.
Two University of Washington grad students featured on Hack a Day in the past three days. Go Huskies!
Go Huskies! :)
How amazing. No extra hardware required.
Imagine what can be done with multiple microphones. Low power kinect :)
Although the gestures make it look like your swatting at flies this could be a good way to put gesture control on an old laptop if they’ll release the code.
While it’s only a guess, but my guess is that in developing this they used hardware of better quality than standard equipment, and if this become a feature included with some computers the manufacture will have to use the appropriate quality of components. I have no desire to have a touch screen monitor on either a desk top or note book computer, and I see myself preferring a mouse or a track pad over waving my hands. A touch screen is at home on tablets, and smart phones. Appears that my upper limit has degraded to 12kHz as one who could hear 20 kHz 18 or so years ago.
The PDF basically says it was developed and tested on standard hardware. I.e., the built-in audio for laptops and ordinary computer speakers/mics for desktops. There’s no indication they built anything custom or even shelled out for high-end audio gear.
Apparently the performance is pretty similar across different models of desktops and laptops.
This was all tested on standard laptops and hardware. No fancy soundcards or microphones. I tested it on pretty much any laptop I could find lying around my lab, including an IBM from 2005.
This looks cool. I’d love to know how it works with a few of them in close proximity. automatic frequency detection and adjustment perhaps?
That is absolutely correct. It picks a new frequency automatically as we describe in the paper.
Great idea, congratulations. I am wondering how common PC or laptop speakers play the tones above 20 kHz and how the microphone records them? I would assume there is a 22 kHz low pass filter at the input of every sound card ..
You are right. There generally is a LPF of 22 KHz or so on most laptops, that is why I stuck to 18-22 KHz frequencies. There is a typo on the article saying it goes up to 80 KHz. However, if the sound card support higher frequencies, like those that sample at 96K, then SoundWave picks up a higher frequency tone.
Sidhant, this is by far the most original and practical doppler hack I’ve seen in a long time. Somebody on a different thread had mentioned about ear level tracking for phased array speaker calibration. I was thinking of designing a phased array speaker system myself and wonder if I can apply Your doppler technique to detect people’s movement. Think of directional speakers that calibrate according to how people moved around in a room.
Please be kind and respectful to help make the comments section excellent. (Comment Policy)