Generating 3D Scenes From Just One Image

The LucidDreamer project ties a variety of functions into a pipeline that can take a source image (or generate one from a text prompt) and “lift” its content into 3D, creating highly-detailed Gaussian splats that look great and can even be navigated.

Gaussian splatting is a method used to render NeRFs (Neural Radiance Fields), which are themselves a method of generating complex scenes from sparse 2D sources, and doing it quickly. If that is all news to you, that’s probably because this stuff has sprung up with dizzying speed from when the original NeRF concept was thought up barely a handful of years ago.

What makes LucidDreamer neat is the fact that it does so much with so little. The project page has interactive scenes to explore, but there is also a demo for those who would like to try generating scenes from scratch (some familiarity with the basic tools is expected, however.)

In addition to the source code itself the research paper is available for those with a hunger for the details. Read it quick, because at the pace this stuff is expanding, it honestly might be obsolete if you wait too long.

Multi-View Wire Art Meets Generative AI

DreamWire is a system for generating multi-view wire art using machine learning techniques to help generate the patterns required.

The 3-dimensional wire pattern in the center creates images of Einstein, Turing, and Newton depending on viewing angle.

What’s wire art? It’s a three-dimensional twisted mass of lines which, when viewed from a certain perspective, yields an image. Multi-view wire art produces different images from the same mass depending on the viewing angle, and as one can imagine, such things get very complex, very quickly.

A recently-released paper explains how the system works, explaining the role generative AI plays in being uniquely suited to create meaningful intersections between multiple inputs. There’s also a video (embedded just under the page break) that showcases many of the results researchers obtained.

The GitHub repository for the project doesn’t have much in it yet, but it’s a good place to keep an eye on if you’re interested in what comes next.

We’ve seen generative AI applied in a similarly novel way to help create visual anagrams, or 2D patterns that can be interpreted differently based on a variety of orientations and permutations. These sorts of systems still need to be guided by a human, but having machine learning do the heavy lifting allows just about anybody to explore their creativity.

Continue reading “Multi-View Wire Art Meets Generative AI”

Explore Neural Radiance Fields In Real-time, Even On A Phone

Neural Radiance Fields (NeRF) is a method of reconstructing complex 3D scenes from sparse 2D inputs, and the field has been growing by leaps and bounds. Viewing a reconstructed scene is still nontrivial, but there’s a new innovation on the block: SMERF is a browser-based method of enabling full 3D navigation of even large scenes, efficient enough to render in real time on phones and laptops.

Don’t miss the gallery of demos which will run on anything from powerful desktops to smartphones. Notable is the distinct lack of blurry, cloudy, or distorted areas which tend to appear in under-observed areas of a NeRF scene (such as indoor corners and ceilings). The technical paper explains SMERF’s approach in more detail.

NeRFs as a concept first hit the scene in 2020 and the rate of advancement has been simply astounding, especially compared to demos from just last year. Watch the short video summarizing SMERF below, and marvel at how it compares to other methods, some of which are themselves only months old.

Continue reading “Explore Neural Radiance Fields In Real-time, Even On A Phone”

Double-Dose Of AI Turns Daily Tasks Into Works Of Art

Not so long ago, “Magic Mirror” builds were all the rage, and we have to admit getting out daily reminders and newsfeeds on an LCD display sitting behind a partially reflective mirror is not without its charms. But styles ebb and flow, so we don’t see too many of those builds anymore. This e-ink daily calendar reminder hearkens back to those Magic Mirrors, only with a double twist of AI.

This project is the work of [Ilkka Turunen], and right up front we’ll say the results are just gorgeous. A lot of that has to do with the 10.3″ e-ink display used, but more with the creative use of not one but two machine learning systems. The first is ChatGPT, which [Ilkka] uses to parse the day’s online calendar entries and grab the most significant events to generate a prompt for DALL-E. The generated DALL-E prompt has specific instructions that guide the style of the image, which honestly is where most of the artistry lies. [Ilkka]’s aesthetic choices, like suggesting that the images look like a 19th-century lithograph or a satirical comic from a turn-of-the-(last)-century newspaper. The prompt is then sent off to DALL-E for rendering, and the resulting image is displayed.

It has to be said that the prompts that ChatGPT generates based on the combination of [Ilkka]’s aesthetic preferences and the random events of the day are strikingly complex. The chatbot really seems to be showing some imagination these days; DALL-E is no slouch either in turning those words into images.

Like the idea of an e-ink daily reminder but prefer a less artistic presentation? This should help.

Continue reading “Double-Dose Of AI Turns Daily Tasks Into Works Of Art”

3D Human Models From A Single Image

You’ve seen it in movies and shows — the hero takes a blurry still picture, and with a few keystrokes, generates a view from a different angle or sometimes even a full 3D model. Turns out, thanks to machine learning and work by several researchers, this might be possible. As you can see in the video below, using “shape-guided diffusion,” the researchers were able to take a single image of a person and recreate a plausible 3D model.

Of course, the work relies on machine learning. As you’ll see in the video, this isn’t a new idea, but previous attempts have been less than stellar. This new method uses shape prediction first, followed by an estimate of the back view appearance. The algorithm then guesses what images go between the initial photograph and the back view. However, it uses the 3D shape estimate as a guideline. Even then,  there is some post-processing to join the intermediate images together into a model.

The result looks good, although the video does point out some areas where they still fall short. For example, unusual lighting can affect the results.

This beats spinning around a person or a camera to get many images. Scanning people in 3D is a much older dream than you might expect.

Continue reading “3D Human Models From A Single Image”

Synthesizing 360-degree Views From Single Source Images

ZeroNVS is one of those research projects that is rather more impressive than it may look at first glance. On one hand, the 3D reconstructions — we urge you to click that first link to see them — look a bit grainy and imperfect. But on the other hand, it was reconstructed using a single still image as an input.

Most results look great, but some — like this bike visible through a park bench — come out a bit strange. A valiant effort for a single-image input, all things considered.

How is this done? It’s NeRFs (neural radiance fields) which leverages machine learning, but with yet another new twist. Existing methods mainly focus on single objects and masked backgrounds, but a new approach makes this method applicable to a variety of complex, in-the-wild images without the need to train new models.

There are a ton of sample outputs on the project summary page that are worth a browse if you find this sort of thing at all interesting. Some of the 360 degree reconstructions look rough, some are impressive, and some are a bit amusing. For example indoor shots tend to reconstruct rooms that look good, but lack doorways.

There is a research paper for those seeking additional details and a GitHub repository for the code, but the implementation requires some significant hardware.

AI In A Box Envisions AI As A Private, Offline, Hackable Module

[Useful Sensors] aims to embed a variety of complementary AI tools into a small, private, self-contained module with no internet connection with AI in a Box. It can do live voice recognition and captioning, live translation, and natural language conversational interaction with a local large language model (LLM). Intriguingly, it’s specifically designed with features to make it hack-friendly, such as the ability to act as a voice keyboard by sending live transcribed audio as keystrokes over USB.

Based on the RockChip 3588S SoC, the unit aims to have an integrated speaker, display, and microphone.

Right now it’s wrapping up a pre-order phase, and aims to ship units around the end of January 2024. The project is based around the RockChip 3588S SoC and is open source (GitHub repository), but since it’s still in development, there’s not a whole lot visible in the repository yet. However, a key part of getting good performance is [Useful Sensors]’s own transformers library for the RockChip NPU (neural processing unit).

The ability to perform things like high quality local voice recognition and run locally-hosted LLMs like LLaMa have gotten a massive boost thanks to recent advances in machine learning, and it looks like this project aims to tie them together in a self-contained package.

Perhaps private digital assistants can become more useful when users can have the freedom to modify and integrate them as they see fit. Digital assistants hosted by the big tech companies are often frustrating, and others have observed that this is ultimately because they primarily exist to serve their makers more than they help users.

Continue reading AI In A Box Envisions AI As A Private, Offline, Hackable Module”