Real Time Video Anonymizer

May 19, 2015

If you’re wondering, Cornell is just like every other university in one respect: the grad students are starving, and wherever there is free food, students circle like vultures. The engineering and CS departments have a mailing list alerting people to free food, but a more automated solution was desired. The first web cam ever was used to notify grad students if a coffee pot was full, but Cornell shot down this idea on the basis of privacy concerns.

It’s final project time for [Bruce Land]’s courses, and a project by [Ferian Chen] and [Sean Ogden] solved the privacy concerns of a webcam in a kitchen. It’s a real-time video anonymizer, that can also be used to livestream ransom demands if you’re so inclined.

There are actually two parts to this project. The first part pixellates faces and any other skin tone, just like you’d see on a true crime TV show. This part of the project was based on an FPGA-based face detection project. ‘Skin’ pixels are defined as having a difference between the red and green channels within a certain range. With the right lighting, it works very well.

You can identify someone with their voice, too, so [Ferian] and [Sean] also made efforts to disguise hungry student’s voices as well. This was done with a phase vocoder that changes the pitch of someone’s voice, but not the spectral characteristics. The result should have been an audio channel that can’t be pinned down to one person, but is still recognizable as speech. The audio processing didn’t work as intended, with noticeable artifacts in the output. There’s still some work to be done, and now that [Ferian] and [Sean] aren’t checking the kitchen every ten minutes, the might have the time to do it.

39 thoughts on “Real Time Video Anonymizer”

rooterkyberian says:

May 19, 2015 at 4:48 am

racist anonymizer?

Report comment

Reply
1. toottoot says:
  
  May 19, 2015 at 5:08 am
  
  They clearly didn’t care very much about accuracy. Much better “skin detection” algorithms exist than “cv2.inrange()”.
  
  This one works extremely well:
  “https://github.com/pi19404/OpenVision/blob/master/ImgProc/adaptiveskindetector.hpp”
  
  It actually works better on darker skin tone.
  
  Report comment
  
  Reply
  1. Sean Ogden says:
    
    May 19, 2015 at 2:01 pm
    
    That algorithm relies on motion detection, so I’m not sure how well it would do for someone standing still. More importantly, the one we used seems to work quite well on all of the skin tones we tested. We tried a lot of different people with dark and very light skin tones, and this relatively simple algorithm got them all very reliably. I’m a big fan of keeping things as simple as possible, so there was no reason to get more complicated.
    
    I’m not entirely sure how much better the openvision skin detector is than the one we used. I haven’t seen a benchmark of either. If you have a link to a benchmark comparing the two, I’m genuinely interested in seeing the results. Even better, if you want to code the adaptive skin detector in Verilog and post it, we’d love to see it in action on our FPGA!
    
    Report comment
    
    Reply
2. eray says:
  
  May 19, 2015 at 5:10 am
  
  It’s not clear whether this would “discriminate”. Note that they are looking at specific color components, which may or may not be common among all humans. I’ve read about a system that can detect pulse and heartrate from video by looking at human-imperceivable changes to the red channel. Perhaps you could look for regions which appear to pulsating (Wouldn’t anonymize dead people so well though…)
  
  Report comment
  
  Reply
  1. cb88 says:
    
    May 19, 2015 at 8:13 am
    
    They wouldn’t care…
    
    Report comment
    
    Reply
  2. James S. (@StripeyType) says:
    
    May 19, 2015 at 8:18 am
    
    That doesn’t care about specific color. it does FFT on the video and looks for *anything* with a high change energy around the frequencies of whatever phenomenon they intend to observe.
    
    Report comment
    
    Reply
3. cnbuckley says:
  
  May 19, 2015 at 5:58 am
  
  If you look on their project page it states that it does a pretty good job detecting skin of different shades as they all seem to have a red/green difference in similar levels.
  
  “The skin detection algorithm is very simple and is based on Ooi[1]. We say a pixel is skin if 100 < R – G < 500, where R and G are the red and green channels from a 10-bit RGB pixel representation. Since the algorithm is based on the difference between R and G and not their absolute values, it does a pretty good job of detecting skin of various shades."
  
  Report comment
  
  Reply
4. gregkennedy says:
  
  May 19, 2015 at 7:08 am
  
  https://s-media-cache-ak0.pinimg.com/736x/a6/30/9f/a6309fbf6f791f9311b165b94f19f0e8.jpg
  
  Report comment
  
  Reply
  1. sneakypoo says:
    
    May 19, 2015 at 8:46 am
    
    I’m going straight to hell, aren’t I?
    
    Report comment
    
    Reply
    1. Rob says:
      
      May 21, 2015 at 2:42 pm
      
      hey, at least Nikon didn’t spell it with an r.
      
      …guess I’ll be joining you.
      
      Report comment
      
      Reply
Rollyn01 says:

May 19, 2015 at 5:25 am

So, if I wear a ninja suit, I could be cloaked and noone would know the difference? Awesome…

On to the hack, I’m loving the class this guy is teaching. After looking at some of his lectures, I can see why it’s so easy for his students to come up with some of the crazy but great stuff they do. With that said, I’m more inclined than ever to look into looking into FPGAs. The way they used it here is very interesting. I get that they wanted to go parallel to get it done faster, but wouldn’t passing a partial transformation to a final transformation with checks in between allow them to ensure the intended effect and adjust as necessary? Or is that what the pipeline was for?

Report comment

Reply
1. Sean Ogden says:
  
  May 19, 2015 at 2:26 pm
  
  If you wear a ninja suit and that ninja suit was colored just like skin, you would look like a very low-resolution skin colored ninja.
  
  About the class, it is by far the most interesting course I’ve ever taken.
  
  I’m not sure exactly what you’re suggesting, but the pipeline is in place so that we can run one FFT while we run the inverse of the previous FFT at the same time. Doing the pipeline this way is fast enough for our purposes, and I believe we need to do the full FFT in order to get useful output because of our overlapping window scheme. If you could elaborate on your idea, we’d love to hear more.
  
  Report comment
  
  Reply
  1. Rollyn01 says:
    
    May 20, 2015 at 6:19 am
    
    I’m thinking that if you use the FTT to get the frequency content of the sample, why not just use that to adjust a PLL? If the PLL is trying to match a particular frequency that you keep changing based on information from the FFT, wouldn’t that cause the PLL to modulate the sample without use of the pipeline? It would still allow for parallel processing because they will both be working on the same sample independently. Just that the output of one of them (FFT(sample)) becomes one of the inputs of the other (PLL(sample, FFT(sample))).
    
    Not sure if that’s all doable. I tend to be a bit scattered brain when it comes to ideas like this. Don’t mind me, I know enough to be smart, but too much to be sane. B)
    
    Report comment
    
    Reply
MyOwnDemon says:

May 19, 2015 at 6:34 am

Is that a pixellated boob? Flashbacks to dial up days…

Report comment

Reply
CRJEEA says:

May 19, 2015 at 6:35 am

Instead of detecting faces wouldn’t it be easier to use a moment filter like the one on ROS. You could filter for anything that changes and blurr those sections out. Fine as long you don’t stand perfectly still for a few minutes at a time.

Report comment

Reply
1. Blue Footed Booby says:
  
  May 19, 2015 at 6:56 am
  
  So what happens if the camera moves?
  
  Report comment
  
  Reply
  1. CRJEEA says:
    
    May 19, 2015 at 10:54 am
    
    Perhaps coupled with blob tracking to correct for relative movement within the image. Adding to the static seance only at the edges of the image.
    Perhaps also use the blob tracking on the moving objects so that they stay covered until they leave the focus area.
    You lower the resolution slightly by cropping a few pixels from the edges of the image to smooth the processing of the visible area. (You can process it before the user sees the output)
    
    Report comment
    
    Reply
2. Shannon says:
  
  May 19, 2015 at 11:37 am
  
  But… they want to be able to check their coffee pot…
  Jeff! Jeff! The coffee pot’s blurry again!
  Oh no.
  
  Report comment
  
  Reply
3. Sean Ogden says:
  
  May 19, 2015 at 1:46 pm
  
  Someone suggested that, and the short answer is that skin detection is pretty robust, and works when people are standing still at the microwave. I think skin detection is actually easier than movement detection.
  
  Report comment
  
  Reply
bthy says:

May 19, 2015 at 10:03 am

first thing i thought of when reading the headline:
http://upload.wikimedia.org/wikipedia/en/9/98/Laughing_man_logo.png

Report comment

Reply
1. Rollyn01 says:
  
  May 20, 2015 at 11:54 am
  
  Okay, you get a +1 for a Laughing Man reference. Love GITS.
  
  Report comment
  
  Reply
  1. bthy says:
    
    May 20, 2015 at 3:46 pm
    
    Thank you good sir, there is still hope for geek culture in the hackaday community.
    
    Report comment
    
    Reply
rasz_pl says:

May 19, 2015 at 10:32 am

this pixelation still leaks information, information that could be extracted using some kind of temporat super resolution

Report comment

Reply
1. bthy says:
  
  May 19, 2015 at 10:46 am
  
  https://youtu.be/sp77AjBdlEc?t=11s
  
  Report comment
  
  Reply
  1. CRJEEA says:
    
    May 19, 2015 at 11:09 am
    
    I know it’s a micky take but it really doesn’t get old.
    This reminds me of the hyper dimensional video that was posted here on HaD a while back. It generated a texture mapped, video game style, 3D environment from a video and then allowed you to navigate to areas of the scene which were not on the original path taken by the video camera.
    
    Report comment
    
    Reply
2. Sean Ogden says:
  
  May 19, 2015 at 1:43 pm
  
  It leaks the average skin tone of someone’s face, and their height and their hair color. It would be trivial to just make the pixels random colors (in fact it would be easier). Not so sure we could make people look different heights.
  
  Report comment
  
  Reply
  1. rasz_pl says:
    
    May 20, 2015 at 2:20 pm
    
    no. this obscuring mosaic is not sampled from average of obscured area, instead every big pixel of mosaic is sampled from precisely one input pixel of the camera, you can see it in the video – mosaic pixels change sharply with movement
    they have fallen into oldie but goodie ditch of leaking data in the “anonymized” output
    https://dheera.net/projects/blur
    https://vimeo.com/1913931
    
    every new frame of “anonymized” video leaks ~5×5 real pixels. All you have to do is guess where are they sampled from (middle of mosaic pixels? upper left corner?) and fill in the blanks
    
    Report comment
    
    Reply
Trav says:

May 19, 2015 at 12:02 pm

OK, stupid question, but… If you want to know if the coffee pot is full why wouldn’t you point the camera at a coffee pot and not at a person’s head? Or forgo the camera all together and put a strain gauge under the pot. When it’s heavier, it’s full.

While the real time pixilation is cool I guess, isn’t this a hammer looking for a nail?

Report comment

Reply
1. sonofthunderboanerges says:
  
  May 19, 2015 at 12:33 pm
  
  I was thinking the same thing. Of course HaD doesn’t need a reason to over-complicate a simple low-tech solution (i.e. adding a piece of fruit? wink wink).
  
  The reason why it would be problematic using a strain gauge *under* the pot is because that area is extremely hot. However, I like your suggestion that weight equivocates fullness… that never occurred to me. Cool!
  
  A water sensor (from Harbor Freight $5 (USD)) could be mounted at rim of pot and when water level reaches critical level it plays a sound which is picked up by web cam microphone. It could be a warble tone or a musical score. Also the web cam could be mounted *next* to the pot to eliminate video privacy concerns. The audio privacy could be enhanced by eliminating the microphone and plug the alarm directly into the line in port.
  
  Or how about using pattern recognition Javascript (compares two images – Resemble.js) to sound an alarm on your PC/MAC when the pattern for full pot is achieved. A piece of white paper could be mounted behind pot to achieve good contrast against black coffee. You could even have intermediate levels compared too (i.e. half-full etc.).
  
  huddle.github.io/Resemble.js/
  
  Report comment
  
  Reply
  1. TacticalNinja says:
    
    May 19, 2015 at 7:22 pm
    
    You can put the strain gauge under the whole coffee maker itself.
    
    Report comment
    
    Reply
    1. sonofthunderboanerges says:
      
      May 20, 2015 at 6:48 am
      
      Again I missed that… very good!
      
      Report comment
      
      Reply
2. Sean Ogden says:
  
  May 19, 2015 at 1:39 pm
  
  It’s not about a coffee pot. It’s about looking in the entire kitchen for leftover food like pizza from seminars and such. So the video camera approach is really the best option here since food can be placed wherever.
  
  Report comment
  
  Reply
Hirudinea says:

May 19, 2015 at 1:47 pm

This would go over great in Japan! Yea, you know what I’m talking about.

Report comment

Reply
Matthias_H says:

May 19, 2015 at 2:43 pm

At the Media Lab, they’ve been doing quite well with this downward-looking version for at least 7 years. http://foodcam.media.mit.edu/view/view.shtml

Report comment

Reply
1. Sean Ogden says:
  
  May 19, 2015 at 6:34 pm
  
  Funny you say that, that was essentially the solution that was put in place in our department. I still like our camera because it can cover more of the kitchen, and see all of the various dishes. There are a lot of leftovers sometimes and the dish under the camera can be relatively empty compared to the ones on the other end of the counter. We could have multiple downward facing cameras, but our solution can see the whole kitchen with one camera.
  
  You could also use this in situations where you really do want to see people, but don’t care about who they are. For example, at the espresso machine to determine if there is a line of people.
  
  More importantly, this was a fun project and excuse to do something cool using an FPGA.
  
  Report comment
  
  Reply
sonofthunderboanerges says:

May 19, 2015 at 7:17 pm

I love the downward-looking idea at MIT. That sort of solves the video privacy issue (and muting the mic). I thought it was just the coffee pot. However, if a vertical camera with az-el (pan zoom tilt) motor is used then privacy issues abound. Pixelatting the faces is the epitome of overkill. I recommend doing what high-security military and government facilities do for civilian visitors like defense contractors: They post privacy caveat signs all over the cafeteria basically saying that you can expect no video monitoring privacy as well as audio and telephone (their telephones that is not your personal cell phone)..So this is a HaD project for HaD lawyer types to come up with appropriate CYA (cover yer’ a$$) wording.

Added feature suggestions:

1) Add external infrared illumination lamps near the food so when the cafeteria lights are turned off you can still see the food area from across the room (if your web cam is night vision enabled). The built-in IR lamps are only good for about 50′. But a Chinese firm found away to increase distance to IR CCTV illumination by diverging an IR Laser beam on the target.

2) Also a tele-presence robot with periscope for looking over counters would be cool too. A small video monitor can have your face on it to tell people in cafeteria “Look out! Video privacy being violated…[big Cheshire Cat grin]”.

3) Add external UV lights over food to kill microorganisms growing while food sits there for several days unrefrigerated.

4) Add a bug light to handle gnats that seem to show up around fruit even in enclosed windowless buildings.

5) Don’t use 2.4 Ghz ANALOG wireless cams UNLESS they are digital wireless cams. They tend to interfere with wi-fi signal.

6) Add tamper-resistant chassis as there is always one wise-guy in the group that gets off sabotaging stuff like this just for sh**ts & giggles..

Report comment

Reply
sonofthunderboanerges says:

May 20, 2015 at 6:58 am

OK privacy issue again: The food never moves right? But the people desiring privacy do. So using motion sensing video software, simply have the video fade to black every time it detects motion in targeted zones or the entire image. Your looking at a half-eaten pizza on the counter next to the microwave. Your viewing this on a stationary camera and some dude walks into frame. He is even looking at the camera which pretty much would identify him. The system would instantly fade to black until he leaves the frame (it actually kills feed to your monitor not the actual video feed to the computer). University administration should legally be okay with this method as no one is exposed to video monitoring. Yet it is a hassle to be interrupted by people moving about in frame. Also assure them that computer is not recording video or audio.

Report comment

Reply
ERROR_user_unknown says:

May 20, 2015 at 9:45 am

all it has to do is drop for one fucking frame and your screwed

Report comment

Reply
1. sonofthunderboanerges says:
  
  May 20, 2015 at 11:20 am
  
  You mean administration will come down on you because a student was exposed in the video because the frame dropped temporarily? Making the fade to black not do it quick enuf? Faculty is not going to be monitoring the video feed as it is of no interest to them. They depend on student’s complaining about it. How could they know they were exposed if they are in the cafeteria and the thing does not record? This thing would be fed to your computer lab or your dorm room. It would be a private feed that is not shared. What evidence could they use to say “they violated my privacy”? How would they know and how could they prove it?
  
  Report comment
  
  Reply

Hackaday

Real Time Video Anonymizer

39 thoughts on “Real Time Video Anonymizer”

Leave a Reply to rooterkyberianCancel reply

Search

Never miss a hack

If you missed it

Supercon 2024: Exploring The Ocean With Open Source Hardware

Porting COBOL Code And The Trouble With Ditching Domain Specific Languages

Jenny’s (Not Quite) Daily Drivers: Raspberry Pi 1

In 2025, The Philly Maker Faire Finds Its Groove

Which Browser Should I Use In 2025?

Our Columns

This Week In Security: No More CVEs, 4chan, And Recall Returns

FLOSS Weekly Episode 829: This Machine Kills Vogons

Announcing The Hackaday Pet Hacks Contest

Keebin’ With Kristina: The One With John Lennon’s Typewriter

Linux Fu: Stopping A Runaway

39 thoughts on “Real Time Video Anonymizer”

Leave a Reply to rooterkyberianCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns