Giving Stable Diffusion Some Depth

You’ve likely heard quite a bit of buzz over the last few months about Stable Diffusion. The new version (v2) has come out, and in addition to the standard image-to-image and text-to-image modes, it also has a depth-image-to-image that can be incredibly useful. [Andrew] has a write-up that guides you on using this mode.

The basic idea is that you can take both an image and depth into the model, which allows you to control what gets put where. Stable Diffusion is a bit confusing, but we already have some great resources to wrap your head around it. In terms of input, you can use a depth map from a camera with lidar (many recent phones include this) or have another model (like MiDaS) estimate it from a 2D picture. This becomes powerful when you can preserve a specific composition, such as an iconic scene from a well-known movie. You can keep the characters’ poses on the screen but transform the style of the scene into whatever you wish (as seen above).

We have already covered a technique to generate textures right in blender, but this new depth information has already been implemented to provide better accuracy of the textures.

[Justin Alvey] used it to create architectural photos from dollhouse furniture. Using the MiDaS model, he estimated the depth and threw away the RGB aspects by setting the denoising strength to maximum. The simplified dollhouse furniture was easily recognizable to the model, which helped produce great results.

However, the only downside is that the perspective produces a rather dollhouse feel. Changing the focal length and moving farther away helps. Overall, it’s a clever use of what the new AI model can do. It’s a fast-moving space, so this will likely be out of date in a few months.

 

Image-Generating AI Can Texture An Entire 3D Scene In Blender

[Carson Katri] has a fantastic solution to easily add textures to 3D scenes in Blender: have an image-generating AI create the texture on demand, and do it for you.

It’s not perfect — the odd door or window feature might suffer from a lack of right angles — but it’s pretty amazing.

As shown here, two featureless blocks on a featureless plain become run-down buildings by wrapping the 3D objects in a suitable image. It’s all done with the help of the Dream Textures add-on for Blender.

The solution uses Stable Diffusion to generate a texture for a scene based on a text prompt (e.g. “sci-fi abandoned buildings”), and leverages an understanding of a scene’s depth for best results. The AI-generated results aren’t always entirely perfect, but the process is pretty amazing. Not to mention fantastically fast compared to creating from scratch.

AI image generation capabilities are progressing at a breakneck pace, and giving people access to tools that can be run locally is what drives interesting and useful applications like this one here.

Curious to know more about how systems like Stable Diffusion work? Here’s a pretty good technical primer, and the Washington Post recently published a less-technical (but accurate) interactive article explaining how AI image generators work, as well as the impact they are having.

How The Art-Generating AI Of Stable Diffusion Works

[Jay Alammar] has put up an illustrated guide to how Stable Diffusion works, and the principles in it are perfectly applicable to understanding how similar systems like OpenAI’s Dall-E or Google’s Imagen work under the hood as well. These systems are probably best known for their amazing ability to turn text prompts (e.g. “paradise cosmic beach”) into a matching image. Sometimes. Well, usually, anyway.

‘System’ is an apt term, because Stable Diffusion (and similar systems) are actually made up of many separate components working together to make the magic happen. [Jay]’s illustrated guide really shines here, because it starts at a very high level with only three components (each with their own neural network) and drills down as needed to explain what’s going on at a deeper level, and how it fits into the whole.

Spot any similar shapes and contours between the image and the noise that preceded it? That’s because the image is a result of removing noise from a random visual mess, not building it up from scratch like a human artist would do.

It may surprise some to discover that the image creation part doesn’t work the way a human does. That is to say, it doesn’t begin with a blank canvas and build an image bit by bit from the ground up. It begins with a seed: a bunch of random noise. Noise gets subtracted in a series of steps that leave the result looking less like noise and more like an aesthetically pleasing and (ideally) coherent image. Combine that with the ability to guide noise removal in a way that favors conforming to a text prompt, and one has the bones of a text-to-image generator. There’s a lot more to it of course, and [Jay] goes into considerable detail for those who are interested.

If you’re unfamiliar with Stable Diffusion or art-creating AI in general, it’s one of those fields that is changing so fast that it sometimes feels impossible to keep up. Luckily, our own Matthew Carlson explains all about what it is, and why it matters.

Stable Diffusion can be run locally. There is a fantastic open-source web UI, so there’s no better time to get up to speed and start experimenting!

Hackaday Links Column Banner

Hackaday Links: October 2, 2022

“Necessity is the mother of invention,” or so the saying goes. We’ve never held to that, finding that laziness is a much more powerful creative lubricant. And this story about someone who automated their job with a script is one of the best examples of sloth-driven invention since the TV remote was introduced. If we take the story at face value — and it’s the Internet, so why wouldn’t we? — this is a little scary, as the anonymous employee was in charge of curating digital evidence submissions for a law firm. The job was to watch for new files in a local folder, manually copy them to a cloud server, and verify the file with a hash to prove it hasn’t been tampered with and support the chain of custody. The OP says this was literally the only task to perform, so we can’t really blame them for automating it with a script once COVID shutdowns and working from home provided the necessary cover. But still — when your entire job can be done by a Windows batch file and some PowerShell commands while you play video games, we’re going to go out on a limb and say you’re probably underemployed.

People have been bagging on the US Space Force ever since its inception in 2019, which we think is a little sad. It has to be hard being the newest military service, especially since it branched off of the previously newest military service, and no matter how important its mission may be, there’s still always going to be the double stigmas of being both the new kid on the block and the one with a reputation for digging science fiction. And now they’ve given the naysayers yet more to dunk on, with the unveiling of the official US Space Force service song. Every service branch has a song — yes, even the Army, and no, not that one — and they all sound appropriately martial. So does the Space Force song, but apparently people have a problem with it, which we really don’t get at all — it sounds fine to us.

Continue reading “Hackaday Links: October 2, 2022”

AI Dreaming Of Time Travel

We love the intersection between art and technology, and a video made by an AI (Stable Diffusion) imagining a journey through time (Nitter) is a lovely example. The project is relatively straightforward, but as with most art projects, there were endless hours of [Xander Steenbrugge] tweaking and playing with different parts of the process until it was just how he liked it. He mentions trying thousands of different prompts and seeds — an example of one of the prompts is “a small tribal village with huts.” In the video, each prompt got 72 frames, slowly increasing in strength and then decreasing as the following prompt came along.

There are other AI videos on YouTube, often putting the lyrics of a song into AI-generated form. But if you’ve worked with AI systems, you’ll notice that the background stays remarkably stable in [Xander]’s video as it goes through dozens of feedback loops. This is difficult to do as you want to change the image’s content without changing the look. So he had to write a decent amount of code to try and maintain visual temporal cohesion over time. Hopefully, we’ll see an open-source version of some of his improvements, as he mentioned on Twitter.

In the meantime, we get to sit back and enjoy something beautiful. If you still aren’t convinced that Stable Diffusion isn’t a big deal, perhaps we can do a little more to persuade your viewpoint.

Continue reading “AI Dreaming Of Time Travel”