Remove A Speaker’s Voice From A Recording Using Ultrasound

What if you could effectively prevent someone from recording your voice? This is the focus of a study by Guo et al. (2022) at Michigan State University, in which they use a dynamically calculated audio signal that effectively cancels out one’s voice in a recording device. This relies on an interesting aspect of certain micro-electro-mechanical system (MEMS) microphones, which are commonly used in smartphones and other recording devices.

Pressure sensitivity of a MEMS microphone. (credit: Brian R. Elbing)
Pressure sensitivity of a MEMS microphone. (credit: Brian R. Elbing)

A specially crafted ultrasound signal sent to the same microphone which is recording one’s voice can result in the voice audio signal being gone on the final recording. The approach taken by the authors involves using a neural network that is trained on voice samples of the person (“Bob”) whose voice has to be cancelled. After recording Bob’s voice during a conversation, the creatively named Neurally Enhanced Cancellation (NEC) system determines the ultrasound signal to be sent to the target recording device. Meanwhile the person holding the recording device (“Alice”) will still perceive Bob’s voice normally.

As ultrasound is highly directional, the system can only jam a specific microphone and wouldn’t affect hidden microphones in a room. As noted by the authors, it is possible to do general microphone jamming using other systems, but this is legally problematic, which should not be an issue with their NEC system.

Thanks to [JohnU] for the tip!

15 thoughts on “Remove A Speaker’s Voice From A Recording Using Ultrasound

  1. A sound of sufficient intensity outside of the human hearing, would probably in most devices cause the ALC to lower the volume so that recording is effectively impossible. Most devices won’t filter the input.

    1. Not an advisable method.
      1: despite a person not being able to “hear” the sound, it is still present and can be very harmful to all people present. See active denial devices (crowd control) for details.
      2: the above also happens with sub audible sounds (infra sound)
      3: if the device has a level indicator it’s possible that the sound pressure might be indicated but the audible sound portion not heard by the device. This would raise suspicion from the person recording. (bad idea, keeping the person naive, are the best methods).

      Side muse, if used against an Amazon Echo, could this be considered a form of “Echo cancellation”?

      1. It probably wouldn’t need to be super loud rather louder then the speaker is speaking. Like how if you wisper in a crowded restaurant someone a few feet away might not hear you over the other noise.

        Keeping the recorder unaware of the canceling might not matter. For instance, if you suspected your room was bugged or even as a automatic precaution against bugging a room.

  2. Interesting project. I wonder if all the special training and complexity was really necessary; blasting enough ultrasound at a microphone will surely saturate it, swamping out other signals.

    1. At that intensity it would probably have effects on the people in the room. Even if sound is outside of our hearing range it can still damage our ears quite effectively.

  3. Since it’s harder to conceal microphones than human heads, maybe the opposite solution could work. Use ultrasound to generate highly focused sound toward listener’s ears. While you could technically place microphones next to your ears, it’s still harder to conceal.
    An alternative and cheaper solution might be giving all listeners a bone conduction headset which might be harder ti accurately record of? But neither solution is flawless.
    Even with zero sound, a skilled lip reader or AI could just recreate the speaker’s voice

  4. Article title was a bit misleading; this idea purports to prevent the recording of a specific voice, not to remove it from a completed recording. [disappointment]

    Interesting idea. Two observations:

    – “Alice, please put your phone away”.

    – Someone who truly wants to make a secret recording can employ devices with microphones that would not be sensitive to that ultrasonic cancellation circuit (bandlimited, post-mic filtering, etc)

    1. Isn’t that what anialiassing filter does? I was sure that we record with higher frequencies so that we can use less agressive filtering but still the cut off frequency starts slightly above 22,5 kHz.

      1. Anti-aliasing filters remove frequencies at/above the Nyquist frequency (half the sampling rate) to prevent audible artifacts in the digitized signal. Current sampling methods often use a very high initial sample rate, so the presence of a sharp anti-aliasing filter isn’t guaranteed.

        Also, I believe that the proposed method may rely on the ultrasonic signal being “demodulated” by the expected MEMS mic… but recorders don’t have to have the kind of mic that’s responsive to ultrasonic signals.

      2. This technique relies on harmonics of the ultrasonic signal: it ‘rings’ the MEMS microphone at audible frequencies, but not the human ear (which are nicely squishy and effectively attenuate ultrasonics). Antialiasing would not help, because the harmonics are within the bandpass range.

  5. If Alice can still hear Bob, then they just need to get a loopback installed on her phone (this can be done in software or hardware) and record the output from that.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.