Background Substitution, No Green Screen Required

All this working from home that people have been doing has a natural but unintended consequence: revealing your dirty little domestic secrets on a video conference. Face time can come at a high price if the only room you have available for work is the bedroom, with piles of dirty laundry or perhaps the incriminating contents of one’s nightstand on full display for your coworkers.

There has to be a tech fix for this problem, and many of the commercial video conferencing platforms support virtual backgrounds. But [Florian Echtler] would rather air his dirty laundry than go near Zoom, so he built a machine-learning background substitution app that works with just about any video conferencing platform. Awkwardly dubbed DeepBackSub — he’s working on a better name — the system does the hard work of finding the person in the frame with Tensorflow Lite. After identifying everything in the frame that’s a person, OpenCV replaces everything that’s not with whatever you choose, and the modified scene is piped over a virtual video device to the videoconferencing software. He’s tested on Firefox, Skype, and guvcview so far, all running on Linux. The resolution and framerates are limited, but such is the cost of keeping your secrets and establishing a firm boundary between work life and home life.

[Florian] has taken the need for a green screen out of what’s formally known as chroma key compositing, which [Tom Scott] did a great primer on a few years back. A physical green screen is the traditional way to do this, but we honestly think this technique is great and can’t wait to try it out with our Hackaday colleagues at the weekly videoconference.

27 thoughts on “Background Substitution, No Green Screen Required

  1. With technology like this, deep fake, realtime motion capture to high resolution CG human animation and more, it’s doing what John Brunner had in “The Jagged Orbit” (published 1969) to fake a video but much higher tech.

  2. I’m sure we could play a game of my software is more smarterer than yours and composite an image of the background from the pixel fringe, or light shining under the earlobes. IDK if you can take some interference fringe effects right off the edge also and reverse raytrace more detail from behind the subject if you’ve got a couple of dozen teraflops on hand.

  3. I think we would be better served with teaching people to hang a blanket or bed sheet, heck, even a couple of towels around the seating area.
    This would cut down on a lot of the bad audio.
    I assume they pick the most light reflective spot (lighting wise) rather than moving a couple of lamps to get better lighting.
    The problem is that light reflecting surfaces always lead to the bathroom or stairwell,audio, effect.
    I’ve seen a few of the “professional” script readers eventually move to the bookshelf or a window with the blinds turn downward. This has improved things a bit.

    Now IF only they would just learn to do a DB level check.
    Nothing like sitting through the weather guessers segment who has the levels at 3/4 clipping and then the station engineer finishes compressing it and pegging the meters for the loudness wars.
    Oh well, at least (for the moment) they aren’t standing in front of the green screen/display while you’re trying to read the info.

  4. I’ve done tests a few years back by using a few frames (for compression artefact-ignorance) of an empty shot, create a low-pass-filter mask per color from that and do a simple delta-threshold-subtraction from life video. No need for “AI”, works great, is super-fast and might even result in less sharp edges.
    As long as your background is static and lighting doesn’t change, that is :-)

    The problem with “class detection” always is that you experience false flags and don’t really have control over shapes. For a video conference that’s fine, for “professional use” (whatever you consider that to be) it’s not the best approach – my 2 cents.

      1. Back then I didn’t use openCV (I still try to avoid it for its many, many crashes, undocumented “features”, bad examples and 8-bit-only-support – I am exaggerating and being unfair here). Most of openCV’s shortcomings I have found over the last few years were all, no exception, due to its 8bit-grayscale-is-the-solution approach. Which fails most of the time anyway.
        For me “my” method worked fine with me sitting still in front of the camera because I took care of having enough hue contrast (I am not sure if openCV uses a hue/luminance comparison or does its stupid grayscale 8 bit tricks).

        I think openCV today is a bit more “open” to the real world’s needs in computer vision. You see examples of “real” camera footage being used instead the 320×240 pixel thingies they use in their demo “tutorials”.

      1. Hi, Jack,

        I am sorry, I have learned the hard way that presenting code you came up with yourself on the internet leads to people copy&paste-abusing it, turning it into commercial products and then sue you. Too many Americans out there (yes, I was targeted by an American company and because of that I have a stupid emotional bias against these people).

  5. Just don’t use a camera. I get that asking to use one is an attempt to retain normality, but it contributes very little. No need to make your private space into a public one.

  6. For the past couple of days I’ve been looking for a background removal app that takes a snapshot of your background without you and your chair. Then it compares the current image with the snapshot and any pixels that match the snapshot get replaced with a different image. Anything that doesn’t match gets shown, ie you and you chair.

    Anybody know of an app that does that specifically?

      1. I was thinking the same thing, and it sounds like you got most of what I was worrying about — lighting change and camera movement — sorted out. Should be “very easy”.

        You might be able to streamline it a lot by doing an edge-detection pass first, and keeping only the things inside the edges that weren’t in the background photo.

        But my limited experience with computer vision suggests that you’d really have to try all these ideas out to find out why they don’t work perfectly. Reality is full of surprises.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.