Focus Your Ears with The Visual Microphone

VideoMicrophone

A Group of MIT, Microsoft, and Adobe researchers have managed to reproduce sound using video alone. The sounds we make bounce off every object in the room, causing microscopic vibrations.  The Visual Microphone utilizes a high-speed video camera and some clever signal processing to extract an audio signal from these vibrations. Using video of everyday objects such as snack bags, plants, Styrofoam cups, and water, the team was able to reproduce tones, music and speech. Capturing audio from light isn’t exactly new. Laser microphones have been around for years. The difference here is the fact that the visual microphone is a completely passive device. No laser or special illumination is required.

The secret is in the signal processing, which the team explains in their SIGGRAPH paper (pdf link). They used a complex steerable pyramid along with wavelet filters to obtain local pixel motion values. These local values are averaged into a global motion value. From this global motion value the team is able to measure movement down to 1/1000 of a pixel. Plenty of resolution to decode audio data.

Most of the research is performed with high-speed video cameras, which are well outside the budget of the average hacker. Don’t despair though, the team did prove out that the same magic can be performed with consumer cameras, albeit with lower quality results. The team took advantage of the rolling shutter found in most of today’s CMOS imager based consumer cameras. Rolling shutter CMOS sensors capture images one row at a time. Each row can be processed in a similar fashion to the frames of the high-speed camera. There are some inter-frame gaps when the camera isn’t recording anything though. Even with the reduced resolution, it’s easy to pick out “Mary had a little lamb” in the video below.

We’re blown away by this research, and we’re sure certain organizations will be looking into it for their own use. Don’t pull out your tin foil hats yet though. Foil containers proved to be one of the best sound reflectors.

Thanks [Zach]!

Comments

  1. nioga says:

    I’ve seen it on niebezpiecznik.pl at least 2 days ago. Old news.

    • chuck says:

      I saw someone on Fark point out that they had already seen this on another site yesterday, so your comment is old news.
      Seriously- do people not understand how the internet works? Do you think Adam Fabio flew out to interview these guys a few days after your alphabet soup site did the same thing?
      What are you trying to accomplish by pointing this out? Are we supposed to be impressed with your internet skills? Is Mr. Fabio expected to apologize for wasting your time? Should the offending post be removed? If old news is such a waste of your time why waste more time pointing it out? Do you send messages to CBS to complain that the plane crash they are covering was also reported on NBC 35 seconds earlier? Was your intent to get some attention (mission accomplished)? You have the modern equivalent of the Library of Alexandria at your fingertips- Find a better use for it than just squawking to hear yourself squawk.

  2. Max Siegieda says:

    Honestly my BS meter was ringing like mad watching that video, thinking there’s absolutely no way that visual noise wouldn’t mess the recording up, but no they’re using a camera with a pretty great lens system and sensor that’s also running 25x slower than the recommended speed.

  3. LK says:

    The possibility to do this and the current state of video processing is awesome, but wouldn’t a laser microphone be easier and cheaper than the high speed camera and image processor? DIY laser mic: http://www.lucidscience.com/pro-laser%20spy%20device-1.aspx

    And I think afterwards extraction of conversations from (normal, not specially recorded) videos isn’t feasible, which would be the main advantage over laser microphones.

    • franklyn says:

      Laser mics need a lot of setup and alignment.

    • Hirudinea says:

      I believe they have methods that either vibrate the window to mask sound or they use two windows where the intervening empty space eliminates the sound, this system would defeat both of these defence methods, so I guess buy some drapes.

    • rasz_pl says:

      kind of hard to shine a laser pointer into the past/at a youtube video

      this technique should allow NSA, and other criminal organisations, to analyze video material for audible hints. From the looks of it you can get something as long as it was recorded with a rolling shutter type of sensor (meaning all cellphones)

      • anon says:

        “utilizes a high-speed video camera”

        This wont help anyone extract sound from youtube videos.

        • Angus says:

          Or a normal camera with a rolling shutter sensor.

          • Jim says:

            Whilst I can’t say for certain without further reading, I’d suggest that since this relies on tiny sub-pixel variations, the compression applied to normal video posted on youtube etc is going to consider the pixels of the target object to be unchanging and will only appear in key frames, destroying the audio. The camera in such video is less likely to be mounted on a tripod, and held in the hand, the strongest signal besides waving it around, might be the pulse from the person holding it. Still it remains an interesting and impressive achievement.

          • cplamb says:

            In the video they show it working with a normal camera with a rolling shutter sensor.

    • Whatnot says:

      I think the damn disgusting NSA people want to be able to take your existing video feeds and spy on people really, and that that’s what it is all about.

  4. ejonesss says:

    now places that have no audio recording rules will now have to add video to their rules.

    what could you get from a movie?

    that could be a new way for hollywood to watermark the movie and will survive better than other methods.

    while the watermark may not stop recording or playing it will allow for identifying the source by say for example each theater will have it’s own id number spoken .

    here is how that would work.

    1. hollywood would make a movie.

    2. someone would speak out the identity of each theater (potentially thousands of unique movies personalized to that theater (digital projection theaters the modulation of the object could be done by the theater’s projector (maybe say like modulating the frame hold or focus adjustment or even the brightness))).

  5. John says:

    • twdarkflame says:

      hehe.
      I must admit I was also thinking of a Smart clip a few weeks ago when the Senate Investing Committee was being investigated by the Senate investigating Committee.

  6. a3 says:

    I guess this is useful for when you download porn that is missing audio.

  7. Toot says:

    Could this be a way to add (the original) sound to silent movies? :)

  8. fl@c@ says:

    This would be an interesting way to authenticate products or currency… If you manufactured say a dollar bill in a way that when exposed to audio of a certain frequency, it would resonate at a specific frequency and absorb others… This could be done by adding certain materials like rubber or something that would attenuate whatever frequencies you want to eliminate… or the reverse could be done where it only attenuates a certain frequency.. Merchandise could be packaged in materials that allow manufacturers to identify it as authentic.. Or maybe not.. :)
    Either way, it would be interesting to see what materials react in what ways to different tones, etc..

    • fl@c@ says:

      It might also be interesting to think of the possibility of using this for identification.. Imagine a bank card, or drivers license that had chip or something that the machine it is inserted into examines under exposure to some combination of tones with a tiny camera that analyzes the chips vibrational response… and each personal has a slightly different combination allowing a unique identifier.. Just thoughts off the top of my head.. :)

    • danieljlouw says:

      How well are the auditory properties of an object preserved once it’s been creased/folded/torn? I am no expert, but isn’t resonance of a piece of paper affected by folding it?

      • fl@c@ says:

        That would be a good point.. Yes, I suppose it would.. :) Also wear and tear on the bill, probably moisture, and whatever oils and other contaminants… Hmm.. There goes reality again…shooting down my daydreams.. Might still work for the ID card though if the ‘chip’ were protected well enough..but I doubt the benefit outweighs the effort.. :/

    • HC says:

      What is going on with people posting insane and pointless uses for this technology? If you have control of the recording environment and the object, anything you can do with this technology you can do an order of magnitude better with an actual microphone. This is useful when you can’t use a microphone.

  9. Soo-Hyun says:

    Related research by some of the same authors: http://people.csail.mit.edu/mrub/vidmag/

  10. mannanj says:

    great research by these mit researchers and an idea i am very much interested in making a hack for in the future. I made an article in the forums a few days ago about this here:

    http://forums.hackaday.com/viewtopic.php?f=10&t=4755

    I’m looking for any takers, researchers, hackers, anyone with ideas who would like to pursue this further and make any kind of device with me. if you are interested throw a post in there or send a pm my way!!

  11. My initial thought was what two random words will the NSA & CIA give this one in their ToolKit? I’m voting for RollingShutter…

  12. Marvin says:

    “Rolling shutter CMOS sensors capture images one row at a time. ”

    Not a great description. Exposure is controlled by a pixel reset ahead of the pixel read. If the light to the sensor is bright enough a rolling shutter camera will reduce the lag to the order of one line.

    In terms of paleoacoustics applications, no cinematographer or camera man would ever do this deliberately in the normal course of filming something. 24fps footage for example normally aims for an exposure of 1/48th (180 degrees in cine terms) so things blur instead of warp.

    • BrightBlueJim says:

      That’s what I was thinking – they probably used a very fast shutter speed on the rolling shutter tests. AND an extremely stable platform for the camera. I doubt that this would work for just random video footage.

  13. sjamaan says:

    You could use that to analyse some old movies and you tube videos without sound to try and recover what was going on…even the 60fps from the slr you could tell the notes…

  14. ERROR_user_unknown says:

    fuck more tools for the fed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 96,532 other followers