When you are young, you take it for granted that you can pick out a voice in a crowded room or a factory floor. But as you get older, your hearing often gets to the point where a noisy room merges into a mishmash of sounds. University of Washington researchers have developed what they call Target Speech Hearing. In plain English, it is an AI-powered headphone that lets you look at someone and pull their voice out of the chatter. For best results, however, have to enroll their voice first, so it wouldn’t make a great eavesdropping device.
If you want to dive into the technical details, their paper goes into how it works. The prototype uses a Sony noise-cancelling headset. However, the system requires binaural microphones so additional microphones attach to the outside of the headphones.
Given training data, we wonder if traditional correlation methods would be just as effective. In other words, you could use facial recognition to figure out who’s talking and pull their voice out using more traditional signal processing techniques. However, this system can potentially pick up sound from unknown speakers, figuring direction from the binaural microphones, so even if the correlation method worked well on known speakers, the new system is likely superior in new situations.
There’s more to noise-cancelling headgear than you might think. Or you can just go low-tech.
> In plain English, it is an AI-powered headphone that lets you look at someone and pull their voice out of the chatter. For best results, however, have to enroll their voice first, so it wouldn’t make a great eavesdropping device.
So we’re at the point in which “AI” denotes the use of directional microphone + some DSP, or am I missing something?
Meh, I could go either way on this. I suppose it is doing DSP, but it’s being done by a neural network that was previously trained on the target voice, and that is an AI technique.
The most remarkable thing about this to me is running a neural network fast enough to do noise cancellation.
Wow, that’s really really cool! Although I doubt grandma will be wearing them during family parties (because the headphone messes with her hair and the size of the headphones messes with her looks) but that’s a “tiny” detail. I love this concept!
I’ve never been able to hear voices above the noise, and now you’re telling me my hearing is going to get worse?! That’s a downer.
I read HaD’s article and skimmed the source a bit -> I’m not sure how the system selects the “source”…
I’d have expected something like
1. two eyetracking cameras
2. triangulation where one is looking (direction, distance)
3. ??? some math ???
4. use three microphones (left, right & top) to enhance the sound from the triangulated source.
But that doesn’t use AI…
There are two microphones. When the user taps a button and look towards the source, the sound arrives at both microphones at the same time (as does sound from straight behind, above and below you, but most likely the source is the major contributor). Since the source signal is mostly the same for both microphones, their voice can be extracted and analysed by a neural network. The result of that analysis is used by another neural network to isolate the source voice.
My thought would just be #4 – two or three microphones pointed at where a person talking to you is most likely to be, so all you have to do is point your head at them and be about the right distance from them.
That’s the idea that ocurred to me as I watched my brother in law fiddling with controls for his hearing aids using his phone.
It seems you ought to be able to use beam forming on three microphones to emphasize sound sources “straight ahead” – just turn your head to look at a source in order to “tune it in.”
That would make usage more natural. People normally turn to look at the person they are listening to.
In “formal” settings, people normally turn to the speaker, yes. If you’re e.g. out walking, probably not. The goal of this thing is to also isolate the sound source when for whatever reason you’re not looking.
Yeah, this is for when you’re at a crowded pup and everyone’s shouting so you have to shout. You could be having a conversation with several people, not just one person whose eyes you can stare into so very constantly and lovingly.
Don’t show this article to my wife!
There were glasses that accomplished this without AI. It used 4 mic’s and amplified sound from the direction you were looking, probably feeding it through a voice bandpass filter and used the other mic’s to attenuate sound coming from other directions. It looked much more like a normal pair of glasses.
As a person who has trouble listening in noisy environments, i find that wearing the right ear plugs already helps a lot. Still, it’s great to find uses of AI to help people cope in all kinds of situations.