Image-Generating AI Can Texture An Entire 3D Scene In Blender

[Carson Katri] has a fantastic solution to easily add textures to 3D scenes in Blender: have an image-generating AI create the texture on demand, and do it for you.

It’s not perfect — the odd door or window feature might suffer from a lack of right angles — but it’s pretty amazing.

As shown here, two featureless blocks on a featureless plain become run-down buildings by wrapping the 3D objects in a suitable image. It’s all done with the help of the Dream Textures add-on for Blender.

The solution uses Stable Diffusion to generate a texture for a scene based on a text prompt (e.g. “sci-fi abandoned buildings”), and leverages an understanding of a scene’s depth for best results. The AI-generated results aren’t always entirely perfect, but the process is pretty amazing. Not to mention fantastically fast compared to creating from scratch.

AI image generation capabilities are progressing at a breakneck pace, and giving people access to tools that can be run locally is what drives interesting and useful applications like this one here.

Curious to know more about how systems like Stable Diffusion work? Here’s a pretty good technical primer, and the Washington Post recently published a less-technical (but accurate) interactive article explaining how AI image generators work, as well as the impact they are having.

A VM In An AI

AI knoweth everything, and as each new model breaks upon the world, it attracts a new crowd of experimenters. The new hotness is ChatGPT, and [Jonas Degrave] has turned his attention to it. By asking it to act as a Linux terminal, he discovered that he could gain access to a complete Linux virtual machine within the model’s synthetic imagination.

The AI’s first response was a prompt, so he of course first tried to list the files. Up came a list of directories, so the next step was to create a file and put some text in it. All of this resulted in a readable file, so there was some promise in this unexpected computing resource. But can it run code? Continue reading “A VM In An AI”

Love AI, But Don’t Love It Too Much

The up-and-coming Wonder of the World in software and  information circles , and particularly in those circles who talk about them, is AI. Give a magic machine a lot of stuff, ask it a question, and it will give you a meaningful and useful answer. It will create art, write books, compose music, and generally Change The World As We Know It. All this is genuinely impressive stuff, as anyone who has played with DALL-E will tell you. But it’s important to think about what the technology can and can’t do that’s new so as to not become caught up in the hype, and in doing that I’m immediately drawn to a previous career of mine. Continue reading “Love AI, But Don’t Love It Too Much”

Here’s A Plain C/C++ Implementation Of AI Speech Recognition, So Get Hackin’

[Georgi Gerganov] recently shared a great resource for running high-quality AI-driven speech recognition in a plain C/C++ implementation on a variety of platforms. The automatic speech recognition (ASR) model is fully implemented using only two source files and requires no dependencies. As a result, the high-quality speech recognition doesn’t involve calling remote APIs, and can run locally on different devices in a fairly straightforward manner. The image above shows it running locally on an iPhone 13, but it can do more than that.

Implementing a robust speech transcription that runs locally on a variety of devices is much easier with [Georgi]’s port of OpenAI’s Whisper.
[Georgi]’s work is a port of OpenAI’s Whisper model, a remarkably-robust piece of software that does a truly impressive job of turning human speech into text. Whisper is easy to set up and play with, but this port makes it easier to get the system working in other ways. Having such a lightweight implementation of the model means it can be more easily integrated over a variety of different platforms and projects.

The usual way that OpenAI’s Whisper works is to feed it an audio file, and it spits out a transcription. But [Georgi] shows off something else that might start giving hackers ideas: a simple real-time audio input example.

By using a tool to stream audio and feed it to the system every half-second, one can obtain pretty good (sort of) real-time results! This of course isn’t an ideal method, but the robustness and accuracy of Whisper is such that the results look pretty great nevertheless.

You can watch a quick demo of that in the video just under the page break. If it gives you some ideas, head over to the project’s GitHub repository and get hackin’!

Continue reading “Here’s A Plain C/C++ Implementation Of AI Speech Recognition, So Get Hackin’”

On Getting A Computer’s Attention And Striking Up A Conversation

With the rise in voice-driven virtual assistants over the years, the sight of people talking to various electrical devices in public and in private has become rather commonplace. While such voice-driven interfaces are decidedly useful for a range of situations, they also come with complications. One of these are the trigger phrases or wake words that voice assistants listen to when in standby. Much like in Star Trek, where uttering ‘Computer’ would get the computer’s attention, so do we have our ‘Siri’, ‘Cortana’ and a range of custom trigger phrases that enable the voice interface.

Unlike in Star Trek, however, our virtual assistants do not know when we really desire to interact. Unable to distinguish context, they’ll happily respond to someone on TV mentioning their trigger phrase. This possibly followed by a ludicrous purchase order or other mischief. The realization here is the complexity of voice-based interfaces, while still lacking any sense of self-awareness or intelligence.

Another issue is that the process of voice recognition itself is very resource-intensive, which limits the amount of processing that can be performed on the local device. This usually leads to the voice assistants like Siri, Alexa, Cortana and others processing recorded voices in a data center, with obvious privacy implications.

Continue reading “On Getting A Computer’s Attention And Striking Up A Conversation”

How The Image-Generating AI Of Stable Diffusion Works

[Jay Alammar] has put up an illustrated guide to how Stable Diffusion works, and the principles in it are perfectly applicable to understanding how similar systems like OpenAI’s Dall-E or Google’s Imagen work under the hood as well. These systems are probably best known for their amazing ability to turn text prompts (e.g. “paradise cosmic beach”) into a matching image. Sometimes. Well, usually, anyway.

‘System’ is an apt term, because Stable Diffusion (and similar systems) are actually made up of many separate components working together to make the magic happen. [Jay]’s illustrated guide really shines here, because it starts at a very high level with only three components (each with their own neural network) and drills down as needed to explain what’s going on at a deeper level, and how it fits into the whole.

Spot any similar shapes and contours between the image and the noise that preceded it? That’s because the image is a result of removing noise from a random visual mess, not building it up from scratch like a human artist would do.

It may surprise some to discover that the image creation part doesn’t work the way a human does. That is to say, it doesn’t begin with a blank canvas and build an image bit by bit from the ground up. It begins with a seed: a bunch of random noise. Noise gets subtracted in a series of steps that leave the result looking less like noise and more like an aesthetically pleasing and (ideally) coherent image. Combine that with the ability to guide noise removal in a way that favors conforming to a text prompt, and one has the bones of a text-to-image generator. There’s a lot more to it of course, and [Jay] goes into considerable detail for those who are interested.

If you’re unfamiliar with Stable Diffusion or art-creating AI in general, it’s one of those fields that is changing so fast that it sometimes feels impossible to keep up. Luckily, our own Matthew Carlson explains all about what it is, and why it matters.

Stable Diffusion can be run locally. There is a fantastic open-source web UI, so there’s no better time to get up to speed and start experimenting!

AI Dreaming Of Time Travel

We love the intersection between art and technology, and a video made by an AI (Stable Diffusion) imagining a journey through time (Nitter) is a lovely example. The project is relatively straightforward, but as with most art projects, there were endless hours of [Xander Steenbrugge] tweaking and playing with different parts of the process until it was just how he liked it. He mentions trying thousands of different prompts and seeds — an example of one of the prompts is “a small tribal village with huts.” In the video, each prompt got 72 frames, slowly increasing in strength and then decreasing as the following prompt came along.

There are other AI videos on YouTube, often putting the lyrics of a song into AI-generated form. But if you’ve worked with AI systems, you’ll notice that the background stays remarkably stable in [Xander]’s video as it goes through dozens of feedback loops. This is difficult to do as you want to change the image’s content without changing the look. So he had to write a decent amount of code to try and maintain visual temporal cohesion over time. Hopefully, we’ll see an open-source version of some of his improvements, as he mentioned on Twitter.

In the meantime, we get to sit back and enjoy something beautiful. If you still aren’t convinced that Stable Diffusion isn’t a big deal, perhaps we can do a little more to persuade your viewpoint.

Continue reading “AI Dreaming Of Time Travel”