3D Audio Imaging With A Phased Array Microphone

Remember the scene from Blade Runner, where Deckard puts a photograph into a Photo Inspector? The virtual camera can pan and move around the captured scene, pulling out impossible details. It seems that [Ben Wang] discovered how to make that particular trick a reality, but with audio instead of video. The secret sauce isn’t a sophisticated microphone, but a whole bunch of really simple ones. In this case, it’s 192 of them, arranged on long PCBs working as the spokes of a wall-art wheel. Quite the conversation piece.

You might imagine that capturing the data from 192 microphones all at once is a challenge in itself, and that seems to be an accurate assessment. The first data capture problem was due to the odd PCBs pushing the manufacturing process to its limits. About half of the spokes were dead on arrival, with the individual mics having a tendency to short the shared clock line to either ground or the power supply line. Then to pull all that data in, a Colorlight is used as a general purpose FPGA with a convenient form factor. This former pixel controller can be used for a wide variety of projects, thanks to an Open Sourced reverse engineering effort, and is even supported by the Project Trellis toolchain, which was used for this effort, too.

Packetizing all those microphones into UDP packets winds up pushing a whopping 715 Mbps, which manages to fit nicely on a Gigabit Ethernet connection. That data is fed into a GPU Kernel written with Triton, an Open Source alternative to CUDA. This performs one of two beamforming operations. Near-field beamforming divides the space directly in front of the microphone array into a 64x64x64 grid of 5cm voxels, and can locate a sound source in that 3d space. Alternatively, the system can run a far-field beamform, and locate a sound source in a 2d direction, on a 512×512 grid.

As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer.

The most impressive trick is to run the process the other way, and isolate the incoming audio coming from a specific direction. The demo here was to play static fro one source, and music from a second, nearby source. When listening from only one microphone, the result is a garbled mess. But applying the beamforming algorithm does an impressive job of isolating the directional audio. Click through to hear the results.

And if that’s not enough, check out the details of another similar microphone array project.

28 thoughts on “3D Audio Imaging With A Phased Array Microphone

  1. So taking say 50 samples from 4 microphones, or 100 from 2, provided the sound your tracking is fairly repetitive or consistent and you can grab them fast enough, you could potentially triangulate something’s relative position in a 3m² room. Add acoustic effects into that and you might be able to approximate the position of the walls too. Pretty scary to contemplate if it’s possible. Although most phones seem to come with structured light 3d scanners these days anyway, masking tape may no longer be enough. :D

  2. Around 10-20 years ago, I remember a television network was experimenting with a similar setup to listen in on individual conversations at an NBA game. They had a similar circle of microphones overhead, and could pick a location on the court, compensate for the audio delays, and easily pick out the conversation at the selected spot.

      1. An orchestra or choir would be a great application because the musicians aren’t moving around.

        A while back I thought it would be cool to use something like this for a stage play. The issue there is that even though there are fewer actors they’re moving around. Nowadays you could track the actor locations with an IR/RF beacon like I’ve heard rock performers use for spotlights or even with image recognition. Then Kanye would have a hard time grabbing Taylor’s microphone.

  3. I doubt the copious amount of mics is necessary. Also the radial pattern is somewhat sub-optimal in my opinion. Nice demo of DSPing and kudos for the open sourceness, though!

  4. Agreed on the suboptimal pattern. Sounds like a good application for a Costas Array (the 2-D analog of the Golomb Ruler, not to be conflated and confused as the Golay Array).

  5. I’ve been very impressed by a demo of this working the other way around – a phased array of speakers that could cause a sound to be heard only at certain points in a room. Obviously, all some distance from the speakers or it wouldn’t be a big deal!
    https://holoplot.com/technology/

    It was genuinely amazing. You could hear naration is different languages depending on where you stood. Or the whole room could have the sounds of a raging storm at sea with people shouting, and at only one point you could also hear someone whispering in your ear.

      1. And quite similar to the ultrahaptics technology (who bought out leap I think) where they use little piezo speakers and precisely control the interference of waves in the 3d space to create sufficient force to feel like a solid object when you touch that spot. They demo this visually by making a light polystyrene ball levitate.

    1. This technique will be used by the new MG Sphere venue in Las Vegas. The spatial audio system being designed into the structure has approximately 160,000 speakers (don’t know how many channels) capable of advanced beamforming.

      1. 160,000 speakers, at Las Vegas acoustic power levels, independently targetable?
        What could possibly go wrong?
        Sounds like the kernel of a great murder technothriller.

  6. The question for musical use would be how many artifacts the phase cancellation and reinforcement produces…. I’ve had problems in the past where processing for voice interacted very poorly with instruments.

  7. 30 years ago,or so. I read that the US Navy was using a passive listening system that listened to underwater reflections of the natural noises in the sea in order to “see” things such as reefs, sea mounts, etc., for faster detection and faster running of submarines and ships with vastly reduced chances of collisions with same.
    Oh…and then, since no pinging was not needed, which would give away the sbip/submarine position, a sub(especially) would sit in a conveniently-hidden location and just listen for the passage of another’s submarine and be able to track it’s positions, velocity and trajectory, and anything else it was doing! Although not specifically addressed, I suspect that even “stealthy” craft were detectable because the ocean sounds would be modified by it’s physical presence irrespective of it’s radar/sonar cloaking.

  8. I wish someone would address why so many channels were needed. Why would polycom use 3, my phone 2, and this one almost 200? If you sample the sound 5x 20kHz it should be enough to interpolate fractional sine waves for tight control of arrival angles.

    1. If you want to find the direction to a single point sound source with a known and narrow frequency spectrum in an infinite 2-D half-plane with no other sources or reflectors, then two receivers are sufficient.

      If you want to work in 3D, or with reflectors or other sources present, or with wider bandwidth, then you either put up with ambiguities and ghost images (sidelobes), or you employ more receivers. Or you make assumptions, approximations, or use other data. For example, our two biological receivers (ears) work because our auditory processing apparatus exploits the angular dependence of each receiver’s response in the acoustic shadow of our heads (the Head-Related Transfer Function).

      Is 200 overkill? It depends on your requirements for accuracy, signal/noise, sidelobe rejection, bandwidth, etc.. Many commercial systems use thousands.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.