You’ve seen it in movies and shows — the hero takes a blurry still picture, and with a few keystrokes, generates a view from a different angle or sometimes even a full 3D model. Turns out, thanks to machine learning and work by several researchers, this might be possible. As you can see in the video below, using “shape-guided diffusion,” the researchers were able to take a single image of a person and recreate a plausible 3D model.
Of course, the work relies on machine learning. As you’ll see in the video, this isn’t a new idea, but previous attempts have been less than stellar. This new method uses shape prediction first, followed by an estimate of the back view appearance. The algorithm then guesses what images go between the initial photograph and the back view. However, it uses the 3D shape estimate as a guideline. Even then, there is some post-processing to join the intermediate images together into a model.
The result looks good, although the video does point out some areas where they still fall short. For example, unusual lighting can affect the results.
This beats spinning around a person or a camera to get many images. Scanning people in 3D is a much older dream than you might expect.