Two students at Cornell University have put together a rather curious sound tracking device called an Acoustic Impulse Marker.
[Adam Wrobel] and [Michael Grisanti] study electrical and computer science, and for their final microcontroller class they decided to build this device using the venerable ATmega 1284p.
The system uses a three-microphone array to accurately position sharp noises within 5 degrees of accuracy. The microcontroller detects the “acoustic delay” between the microphones which allows it to identify the location of the sound’s source vector. It does this using an 8-stage analog system which converts the sounds from each microphone into a binary signal, which identifies when each microphone heard the noise. The resultant 3 binary signals are then compared for their time delay, it selects the two closest microphones, and then does a simple angle calculation based on the magnitudes of each to determine the sounds position.
When the sound is identified, its location is sent to a 180 degree servo, which is geared at a 1:2 ratio to a pencil “pointer” which gives it a full 360 degrees of pointing capabilities.
The system works best for sharp sounds, but occasional picks up speech as well.
[via Hacked Gadgets]
Clever, but kind of simple at the same time. Perfect!
The Royal Australian Air Force have an aircraft firing range. Planes come in at assorted dive angles, and shoot at a land based target. being outside shelter when the 70mm rounds are flying is suicidal. So how do they check if the flyboys & girls hit the target?
A long T-shaped bar, with sensors like this describes, track the sonic boom when the bullet hits the target. This is then shown on a PC screen in a nearby bunker.
Same math, different readout format.
Apparently they’re looking to get some DHS research grant money.
This technology is already in use in gunshot locator systems such as ShotSpotter ™ and it appears that DHS, the military and other are interested in these types of products.
More at http://en.wikipedia.org/wiki/Gunfire_locator
You beat me to this Tachyon1! :-)
USG gives out hefty grants to US towns and cities to install this ShotSpotter gadget. It doesn’t use a pencil nor a laser. It uses a GUI that plots a vector on a map. It also has a CCTV camera that swings to the vector to locate the source. It was pioneered in Iraq as an automated sniper locator system. The more units you install the better the resolution is.
Newark NJ and Syracuse NY are now “experts” at using it to curtail their once out-of-control gun violence. However, cities like Hartford CT are still having a “major” learning curve and shooters are getting away too much there.
I would like to see an underwater hydrophone direction finder to locate loud underwater targets like whales, biological anomalies, or enemy submarines. Now DHS would love something like this. Presently our ASW methods like this involve literally hundreds or more hydrophones all over a sub’s hull. Acoustic direction is extrapolated from them. Imagine this inside of one to three small sonar radome pods instead of so many microphones.
Very nice with all that tech it would be nice to put a laser pointer on there instead of a pencil.
With a couple more mics you could actually get 3-axis sound positioning, and a laser pointer would be pretty good for that.
You only need 4 mics in a tetrahedral configuration to get pretty accurate directionality, the hard part would be creating a mount for the laser capable of pointing anywhere in a sphere.
Best presentation I’ve seen in a long time, and the write up i amazing! A bit sad that they failed in their goal in doing speech detection.
We have a video conferencing system (VC3) which points one of its two cameras at whomever is talking the loudest.
Its very odd at first to be tracked around the room by a camera (and to see a closeup of your face on a 57″ LCD), but works quite well for video conference. Not as cool as the HP HALO system (which is totally star trek) but means you can have more than a dozen people in a meeting.
I guess the VC3 camera must use a similar approach.
jacksonliam – I thought those systems are based on the “talking stick” principle – like an old Native American thing during pow-wows – whoever is holding the TS is allowed to speak. IOW the remote is also an IR “beacon” which allows the camera and mic to track the user around the room. I would imagine the way you describe it anyone entering the room and sneezing/coughing/ or just being generally too noisy would distract the tracker momentarily.
I think to overcome the human-speech-gap this system ostensibly has would be to use active filtering. You can filter down to general female or male speech spectrum. Just listen for these two spectrum’s and reject upper and lower frequencies (like a notch filter) and the system should be able to track people talking, correctly. Audio quality would suck but that’s not important for tracking. Also false reflection paths screw it up too.
The target room should be sound-proofed to eliminate echos. Just by adding furniture or putting newspapers behind wall panels should be enuf’ I think. Outside however is a different “kettle of fish”. No way to control your acoustic environment outside. Every vehicle, crowd, weather, or animal is a potential false human-speech target.
The VC3 system doesn’t have a stick or remote, it does just look at who is talking the loudest! The room is fairly sound proofed, they had to cover windows with fabric to reduce echos.
I think it must do filtering, as typing or coughing or knocking on the table doesn’t distract it. I think it does it based on the time someone is making noise too, it doesn’t flick to you for just saying “yep”.
FFT the audio once to get a spectrum. At this point, you could simply notch filter it. But the human voice, when producing vowel sounds, has strong harmonic content. These harmonics, being multiples of the fundamental frequency, produce repeating pattern across the spectrum. Guess what finds repeating patterns? The same FFT you already used to produce the spectrum. So you FFT again, this time feeding it the spectrum from the first FFT, and you get a cepstrum – which is more useful for detecting voices over most background noise than the spectrum alone.
Excellent work!