Real Time Video Anonymizer

If you’re wondering, Cornell is just like every other university in one respect: the grad students are starving, and wherever there is free food, students circle like vultures. The engineering and CS departments have a mailing list alerting people to free food, but a more automated solution was desired. The first web cam ever was used to notify grad students if a coffee pot was full, but Cornell shot down this idea on the basis of privacy concerns.

It’s final project time for [Bruce Land]’s courses, and a project by [Ferian Chen] and [Sean Ogden] solved the privacy concerns of a webcam in a kitchen. It’s a real-time video anonymizer, that can also be used to livestream ransom demands if you’re so inclined.

There are actually two parts to this project. The first part pixellates faces and any other skin tone, just like you’d see on a true crime TV show. This part of the project was based on an FPGA-based face detection project. ‘Skin’ pixels are defined as having a difference between the red and green channels within a certain range. With the right lighting, it works very well.

You can identify someone with their voice, too, so [Ferian] and [Sean] also made efforts to disguise hungry student’s voices as well. This was done with a phase vocoder that changes the pitch of someone’s voice, but not the spectral characteristics. The result should have been an audio channel that can’t be pinned down to one person, but is still recognizable as speech. The audio processing didn’t work as intended, with noticeable artifacts in the output. There’s still some work to be done, and now that [Ferian] and [Sean] aren’t checking the kitchen every ten minutes, the might have the time to do it.

39 thoughts on “Real Time Video Anonymizer

    1. They clearly didn’t care very much about accuracy. Much better “skin detection” algorithms exist than “cv2.inrange()”.

      This one works extremely well:
      “https://github.com/pi19404/OpenVision/blob/master/ImgProc/adaptiveskindetector.hpp”

      It actually works better on darker skin tone.

      1. That algorithm relies on motion detection, so I’m not sure how well it would do for someone standing still. More importantly, the one we used seems to work quite well on all of the skin tones we tested. We tried a lot of different people with dark and very light skin tones, and this relatively simple algorithm got them all very reliably. I’m a big fan of keeping things as simple as possible, so there was no reason to get more complicated.

        I’m not entirely sure how much better the openvision skin detector is than the one we used. I haven’t seen a benchmark of either. If you have a link to a benchmark comparing the two, I’m genuinely interested in seeing the results. Even better, if you want to code the adaptive skin detector in Verilog and post it, we’d love to see it in action on our FPGA!

    2. It’s not clear whether this would “discriminate”. Note that they are looking at specific color components, which may or may not be common among all humans. I’ve read about a system that can detect pulse and heartrate from video by looking at human-imperceivable changes to the red channel. Perhaps you could look for regions which appear to pulsating (Wouldn’t anonymize dead people so well though…)

    3. If you look on their project page it states that it does a pretty good job detecting skin of different shades as they all seem to have a red/green difference in similar levels.

      “The skin detection algorithm is very simple and is based on Ooi[1]. We say a pixel is skin if 100 < R – G < 500, where R and G are the red and green channels from a 10-bit RGB pixel representation. Since the algorithm is based on the difference between R and G and not their absolute values, it does a pretty good job of detecting skin of various shades."

  1. So, if I wear a ninja suit, I could be cloaked and noone would know the difference? Awesome…

    On to the hack, I’m loving the class this guy is teaching. After looking at some of his lectures, I can see why it’s so easy for his students to come up with some of the crazy but great stuff they do. With that said, I’m more inclined than ever to look into looking into FPGAs. The way they used it here is very interesting. I get that they wanted to go parallel to get it done faster, but wouldn’t passing a partial transformation to a final transformation with checks in between allow them to ensure the intended effect and adjust as necessary? Or is that what the pipeline was for?

    1. If you wear a ninja suit and that ninja suit was colored just like skin, you would look like a very low-resolution skin colored ninja.

      About the class, it is by far the most interesting course I’ve ever taken.

      I’m not sure exactly what you’re suggesting, but the pipeline is in place so that we can run one FFT while we run the inverse of the previous FFT at the same time. Doing the pipeline this way is fast enough for our purposes, and I believe we need to do the full FFT in order to get useful output because of our overlapping window scheme. If you could elaborate on your idea, we’d love to hear more.

      1. I’m thinking that if you use the FTT to get the frequency content of the sample, why not just use that to adjust a PLL? If the PLL is trying to match a particular frequency that you keep changing based on information from the FFT, wouldn’t that cause the PLL to modulate the sample without use of the pipeline? It would still allow for parallel processing because they will both be working on the same sample independently. Just that the output of one of them (FFT(sample)) becomes one of the inputs of the other (PLL(sample, FFT(sample))).

        Not sure if that’s all doable. I tend to be a bit scattered brain when it comes to ideas like this. Don’t mind me, I know enough to be smart, but too much to be sane. B)

  2. Instead of detecting faces wouldn’t it be easier to use a moment filter like the one on ROS. You could filter for anything that changes and blurr those sections out. Fine as long you don’t stand perfectly still for a few minutes at a time.

      1. Perhaps coupled with blob tracking to correct for relative movement within the image. Adding to the static seance only at the edges of the image.
        Perhaps also use the blob tracking on the moving objects so that they stay covered until they leave the focus area.
        You lower the resolution slightly by cropping a few pixels from the edges of the image to smooth the processing of the visible area. (You can process it before the user sees the output)

    1. Someone suggested that, and the short answer is that skin detection is pretty robust, and works when people are standing still at the microwave. I think skin detection is actually easier than movement detection.

      1. I know it’s a micky take but it really doesn’t get old.
        This reminds me of the hyper dimensional video that was posted here on HaD a while back. It generated a texture mapped, video game style, 3D environment from a video and then allowed you to navigate to areas of the scene which were not on the original path taken by the video camera.

    1. It leaks the average skin tone of someone’s face, and their height and their hair color. It would be trivial to just make the pixels random colors (in fact it would be easier). Not so sure we could make people look different heights.

      1. no. this obscuring mosaic is not sampled from average of obscured area, instead every big pixel of mosaic is sampled from precisely one input pixel of the camera, you can see it in the video – mosaic pixels change sharply with movement
        they have fallen into oldie but goodie ditch of leaking data in the “anonymized” output
        https://dheera.net/projects/blur
        https://vimeo.com/1913931

        every new frame of “anonymized” video leaks ~5×5 real pixels. All you have to do is guess where are they sampled from (middle of mosaic pixels? upper left corner?) and fill in the blanks

  3. OK, stupid question, but… If you want to know if the coffee pot is full why wouldn’t you point the camera at a coffee pot and not at a person’s head? Or forgo the camera all together and put a strain gauge under the pot. When it’s heavier, it’s full.

    While the real time pixilation is cool I guess, isn’t this a hammer looking for a nail?

    1. I was thinking the same thing. Of course HaD doesn’t need a reason to over-complicate a simple low-tech solution (i.e. adding a piece of fruit? wink wink).

      The reason why it would be problematic using a strain gauge *under* the pot is because that area is extremely hot. However, I like your suggestion that weight equivocates fullness… that never occurred to me. Cool!

      A water sensor (from Harbor Freight $5 (USD)) could be mounted at rim of pot and when water level reaches critical level it plays a sound which is picked up by web cam microphone. It could be a warble tone or a musical score. Also the web cam could be mounted *next* to the pot to eliminate video privacy concerns. The audio privacy could be enhanced by eliminating the microphone and plug the alarm directly into the line in port.

      Or how about using pattern recognition Javascript (compares two images – Resemble.js) to sound an alarm on your PC/MAC when the pattern for full pot is achieved. A piece of white paper could be mounted behind pot to achieve good contrast against black coffee. You could even have intermediate levels compared too (i.e. half-full etc.).

      huddle.github.io/Resemble.js/

    2. It’s not about a coffee pot. It’s about looking in the entire kitchen for leftover food like pizza from seminars and such. So the video camera approach is really the best option here since food can be placed wherever.

    1. Funny you say that, that was essentially the solution that was put in place in our department. I still like our camera because it can cover more of the kitchen, and see all of the various dishes. There are a lot of leftovers sometimes and the dish under the camera can be relatively empty compared to the ones on the other end of the counter. We could have multiple downward facing cameras, but our solution can see the whole kitchen with one camera.

      You could also use this in situations where you really do want to see people, but don’t care about who they are. For example, at the espresso machine to determine if there is a line of people.

      More importantly, this was a fun project and excuse to do something cool using an FPGA.

  4. I love the downward-looking idea at MIT. That sort of solves the video privacy issue (and muting the mic). I thought it was just the coffee pot. However, if a vertical camera with az-el (pan zoom tilt) motor is used then privacy issues abound. Pixelatting the faces is the epitome of overkill. I recommend doing what high-security military and government facilities do for civilian visitors like defense contractors: They post privacy caveat signs all over the cafeteria basically saying that you can expect no video monitoring privacy as well as audio and telephone (their telephones that is not your personal cell phone)..So this is a HaD project for HaD lawyer types to come up with appropriate CYA (cover yer’ a$$) wording.

    Added feature suggestions:

    1) Add external infrared illumination lamps near the food so when the cafeteria lights are turned off you can still see the food area from across the room (if your web cam is night vision enabled). The built-in IR lamps are only good for about 50′. But a Chinese firm found away to increase distance to IR CCTV illumination by diverging an IR Laser beam on the target.

    2) Also a tele-presence robot with periscope for looking over counters would be cool too. A small video monitor can have your face on it to tell people in cafeteria “Look out! Video privacy being violated…[big Cheshire Cat grin]”.

    3) Add external UV lights over food to kill microorganisms growing while food sits there for several days unrefrigerated.

    4) Add a bug light to handle gnats that seem to show up around fruit even in enclosed windowless buildings.

    5) Don’t use 2.4 Ghz ANALOG wireless cams UNLESS they are digital wireless cams. They tend to interfere with wi-fi signal.

    6) Add tamper-resistant chassis as there is always one wise-guy in the group that gets off sabotaging stuff like this just for sh**ts & giggles..

  5. OK privacy issue again: The food never moves right? But the people desiring privacy do. So using motion sensing video software, simply have the video fade to black every time it detects motion in targeted zones or the entire image. Your looking at a half-eaten pizza on the counter next to the microwave. Your viewing this on a stationary camera and some dude walks into frame. He is even looking at the camera which pretty much would identify him. The system would instantly fade to black until he leaves the frame (it actually kills feed to your monitor not the actual video feed to the computer). University administration should legally be okay with this method as no one is exposed to video monitoring. Yet it is a hassle to be interrupted by people moving about in frame. Also assure them that computer is not recording video or audio.

    1. You mean administration will come down on you because a student was exposed in the video because the frame dropped temporarily? Making the fade to black not do it quick enuf? Faculty is not going to be monitoring the video feed as it is of no interest to them. They depend on student’s complaining about it. How could they know they were exposed if they are in the cafeteria and the thing does not record? This thing would be fed to your computer lab or your dorm room. It would be a private feed that is not shared. What evidence could they use to say “they violated my privacy”? How would they know and how could they prove it?

Leave a Reply to rooterkyberianCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.