3D Scanning By Calculating The Focus Of Each Pixel

calculating-focus-to-generate-depth-map

We understand the concept [Jean] used to create a 3D scan of his face, but the particulars are a bit beyond our own experience. He is not using a dark room and laser line to capture slices which can be reassembled later. Nope, this approach uses pictures taken with several different focal lengths.

The idea is to process the photos using luminance. It looks at a pixel and it’s neighbors, subtracting the luminance and summing the absolute values to estimate how well that pixel is in focus. Apparently if you do this with the entire image, and a set of other images taken from the same vantage point with different focal lengths, you end up with a depth map of pixels.

What we find most interesting about this is the resulting pixels retain their original color values. So after removing the cruft you get a 3D scan that is still in full color.

If you want to learn more about laser-based 3D scanning check out this project.

[Thanks Luca]

42 thoughts on “3D Scanning By Calculating The Focus Of Each Pixel

  1. This could be a game changer for autonomous vehicles depending on the computational expense vs. alternative methods. Even if it requires more MIPS, those will likely get less expensive faster than the hardware methods currently in use to reduce computational requirements. I would expect that this method of processing would lend itself to analysis by FPGA?

    1. I believe this should lend itself to FPGA usage quite well, since each pixel can be processed independently once all the images are available.

      The single largest issue I see is that you need two different focuses, so you need not only two pictures, but two pictures from the same camera. This sounds a lot slower than current methods, no matter how fast the processing goes.

    1. Heh, yea the lyrtro camera would be perfect.
      Another idea i had was to basically take a cd-like lens assembly and mount it to a high speed camera, to change the focal length of a shot from macro to infinity within 30 times a second, and then taking each frame (probably running on about 3000 fps) to create a “full focus” 30fps video. Could also use that to get a basic depth map (probably rather low precision or accurate) in each video. This could help VFX artists to create more realistic scenes.

      1. But is the information to use lytro images available to the public?
        Last I checked I found their software lacking, you can for instance determine the position of the DoF but not set its size, so for instance have near AND far in focus or a selectable depth, it was all very dumb and basic, and when I checked (back when) I did not hear of any alternative viewers but theirs.

        And I hear they only last week unlocked the wifi radio that apparently was inside the damn camera all the time.. took them only a what? A few years?
        I like the concept of lytro but question the way they do things.

    2. this 3d scan is phenomenal and the linked camera is insane, I believe I have just seen a good chunk of future innovation and I only wish I could afford that camera lol

  2. Once again, Mike Szczys shows his inability to correctly read a camera-related article. It’s not taking photos at different focal lengths.. it’s a focus stack photo. That just means changing the focus of the lens. Not changing the focal length of the lens used.

    1. in fact that’s a matter of perspective/semantics you cant change the focal length of a lens (under normal circumstances) but you can easily change the focal length of an optical system i.e camera.

    1. Wouldn’t call it well known technique, compared to stereo input and IR mesh input, given there isn’t a OpenCV function for it.
      But on the other hand when you think about it, the technique is blindingly obvious.

        1. Completely different, parallax is used to sense with two eyes. With only one eye I can’t tell if there is a person in the distance walking towards me or away from me until I notice if they are getting bigger or smaller.

          1. From wikipedia https://en.wikipedia.org/wiki/Depth_perception

            “Accommodation
            This is an oculomotor cue for depth perception. When we try to focus on far away objects, the ciliary muscles stretch the eye lens, making it thinner, and hence changing the focal length. The kinesthetic sensations of the contracting and relaxing ciliary muscles (intraocular muscles) is sent to the visual cortex where it is used for interpreting distance/depth. Accommodation is only effective for distances less than 2 meters.”

        1. I was going to point that out. I suffer from amblyopia and the net effect for me is very poor stereo vision. 3D TV does not work for me etc, but I’m not terrible at judging depth (not fantastic either), if wifey were around I could give you a more technical info regarding other visual cues.

          A little OT, but hey, my wife is an optometrist and they get emails from some of their professional bodies with news about what’s happening in the optical world. She came across some info in one which mentioned a treatment for amblyopia in adults. It uses a tetris game where both eyes receive some of the information and each individual eye also receives some information not available to the other eye. Apparently it’s produced significant improvement in those with amblyopia. So I’ve a little project planned once I get some time to use a stellaris launchpad and a couple of Nokia 3310 screens to build a pair of cheap goggles to do the same. Might be a decent project for someone with an occulus rift. The only problem then is the potential for double vision.

          If I ever get there I’m sure I’ll write it up and present some quantitative data on any improvement in my vision.

  3. My Lumix Camera (and maybe other brands) uses that method instead of sonar or IR range to focus the camera. When you press the focus button the focus goes far and close while analyzing in every step the “focusiness” of the image, Finally the lens stop in the step where is the better “magic number”. Its a good method for avoiding bad focus due to glass windows or other objects that makes false “echoes” for IR or sonar. But isn’t a very good method for photography sky or clouds without “solid” objects

    1. Yup, this method (local contrast maximization) is used in most point-and-shoot cameras, and simple autofocus cameras like those in most cell phones.

      I’m not aware of any cameras that use IR time-of-flight to focus, and ultrasonic is a bit out of fashion today (though I’d love to be proven wrong!). DSLRs use another interesting method (well worth a Google search)!

  4. It is just a standard focus stack based depth estimation. This is exactly as wtomelo says how passive focus cameras find the focal plane (e.g. calculating the pixel variance or the gradient variance on a small box in the center of the image and taking the focus plane where it is max).

    For every pixel you get a depth value, thus 3D info (x,y, and Z=depth). Then you can easily calulate the X,Y world coordinates as well.

  5. Hi guys, I’m Giancarlo Todone the author of this work: Thanks for all the positive comments!
    To all those saying that this is just plain depht from focus stack, well.. you’re right, but i didn’t forgot to mention it in my article…. my motivations for “reinventing the wheel” were lack of good free/open source code. For what concerns the LYTRO… well I hope to have a chance to play with it sooner or later :)

    1. I reverse engineered the stock camera app of my LG-P500 and found that there are 10 presetted focus distances i can choose from by sending a custom command to driver. Probably this is something that just works on that particular phone/camera model, but I’ll experiment further and will surely write about it in near future.

Leave a Reply to HBodCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.