Shhh… Robot Vacuum Lidar Is Listening

There are millions of IoT devices out there in the wild and though not conventional computers, they can be hacked by alternative methods. From firmware hacks to social engineering, there are tons of ways to break into these little devices. Now, four researchers at the National University of Singapore and one from the University of Maryland have published a new hack to allow audio capture using lidar reflective measurements.

The hack revolves around the fact that audio waves or mechanical waves in a room cause objects inside a room to vibrate slightly. When a lidar device impacts a beam off an object, the accuracy of the receiving system allows for measurement of the slight vibrations cause by the sound in the room. The experiment used human voice transmitted from a simple speaker as well as a sound bar and the surface for reflections were common household items such as a trash can, cardboard box, takeout container, and polypropylene bags. Robot vacuum cleaners will usually be facing such objects on a day to day basis.

The bigger issue is writing the filtering algorithm that is able to extract the relevant information and separate the noise, and this is where the bulk of the research paper is focused (PDF). Current developments in Deep Learning assist in making the hack easier to implement. Commercial lidar is designed for mapping, and therefore optimized for reflecting off of non-reflective surface. This is the opposite of what you want for laser microphone which usually targets a reflective surface like a window to pick up latent vibrations from sound inside of a room.

Deep Learning algorithms are employed to get around this shortfall, identifying speech as well as audio sequences despite the sensor itself being less than ideal, and the team reports achieving an accuracy of 90%. This lidar based spying is even possible when the robot in question is docked since the system can be configured to turn on specific sensors, but the exploit depends on the ability to alter the firmware, something the team accomplished using the Dustcloud exploit which was presented at DEF CON in 2018.

You don’t need to tear down your robot vacuum cleaner for this experiment since there are a lot of lidar-based rovers out there. We’ve even seen open source lidar sensors that are even better for experimental purposes.

Thanks for the tip [Qes]

Focus Your Ears With The Visual Microphone


A Group of MIT, Microsoft, and Adobe researchers have managed to reproduce sound using video alone. The sounds we make bounce off every object in the room, causing microscopic vibrations.  The Visual Microphone utilizes a high-speed video camera and some clever signal processing to extract an audio signal from these vibrations. Using video of everyday objects such as snack bags, plants, Styrofoam cups, and water, the team was able to reproduce tones, music and speech. Capturing audio from light isn’t exactly new. Laser microphones have been around for years. The difference here is the fact that the visual microphone is a completely passive device. No laser or special illumination is required.

The secret is in the signal processing, which the team explains in their SIGGRAPH paper (pdf link). They used a complex steerable pyramid along with wavelet filters to obtain local pixel motion values. These local values are averaged into a global motion value. From this global motion value the team is able to measure movement down to 1/1000 of a pixel. Plenty of resolution to decode audio data.

Most of the research is performed with high-speed video cameras, which are well outside the budget of the average hacker. Don’t despair though, the team did prove out that the same magic can be performed with consumer cameras, albeit with lower quality results. The team took advantage of the rolling shutter found in most of today’s CMOS imager based consumer cameras. Rolling shutter CMOS sensors capture images one row at a time. Each row can be processed in a similar fashion to the frames of the high-speed camera. There are some inter-frame gaps when the camera isn’t recording anything though. Even with the reduced resolution, it’s easy to pick out “Mary had a little lamb” in the video below.

We’re blown away by this research, and we’re sure certain organizations will be looking into it for their own use. Don’t pull out your tin foil hats yet though. Foil containers proved to be one of the best sound reflectors.

Continue reading “Focus Your Ears With The Visual Microphone”