ZeroNVS is one of those research projects that is rather more impressive than it may look at first glance. On one hand, the 3D reconstructions — we urge you to click that first link to see them — look a bit grainy and imperfect. But on the other hand, it was reconstructed using a single still image as an input.
How is this done? It’s NeRFs (neural radiance fields) which leverages machine learning, but with yet another new twist. Existing methods mainly focus on single objects and masked backgrounds, but a new approach makes this method applicable to a variety of complex, in-the-wild images without the need to train new models.
There are a ton of sample outputs on the project summary page that are worth a browse if you find this sort of thing at all interesting. Some of the 360 degree reconstructions look rough, some are impressive, and some are a bit amusing. For example indoor shots tend to reconstruct rooms that look good, but lack doorways.
Neural Radiance Fields (NeRFs) are a method of leveraging machine learning to, in a way, do what photogrammetry does: synthesize complex scenes and views based on input images. But NeRFs work in a fraction of the time, and require only a fraction of the source material. There are different ways to go about this and unsurprisingly, there tends to be a clear speed vs. quality tradeoff. But as the video accompanying this new work seems to show, clever techniques mean the best of both worlds.
Generative AI is the new thing right now, proving to be a useful tool both for professional programmers, writers of high school essays and all kinds of other applications in between. It’s also been shown to be effective in generating images, as the DALL-E program has demonstrated with its impressive image-creating abilities. It should surprise no one as this type of AI continues to make in-roads into other areas, this time with a program from OpenAI called Shap-E which can render 3D images.
Like most of OpenAI’s offerings, this takes plain language as its input and can generate relatively simple 3D models with this text. The examples given by OpenAI include some bizarre models using text prompts such as a chair shaped like an avocado or an airplane that looks like a banana. It can generate textured meshes and neural radiance fields, both of which have various advantages when it comes to available computing power, training methods, and other considerations. The 3D models that it is able to generate have a Super Nintendo-style feel to them but we can only expect this technology to grow exponentially like other AI has been doing lately.
For those wondering about the name, it’s apparently a play on the 2D rendering program DALL-E which is itself a combination of the names of the famous robot WALL-E and the famous artist Salvador Dali. The Shap-E program is available for anyone to use from this GitHub page. Even though this code comes from OpenAI themselves, plenty are speculating that the AI revolution to come will largely come from open-source sources rather than OpenAI or Google, something for which the future is somewhat hazy.