AI Image Generation Meets Virtual Dress Up

Image generators have really taken off thanks to machine learning, and all kinds of new ideas have been turned on in people’s heads as a result. OOTDiffusion is one such project, its job being to allow virtual try-ons of clothing by combining a picture of a person and an item of clothing, and doing so in a coherent way.

A model sporting a 2021 Remoticon shirt.

When it comes to AI image generators, maintaining consistency of a particular subject in a picture while changing or combining other parts of the image isn’t a trivial task. (If you’re unfamiliar with the basics of how diffusion-type AI image generators work, we have you covered.)

Virtual try-on of clothing is not a new idea, but it’s also far from being a completely solved problem. It’s easy to feed a system high-quality images of people and clothing and ask it to combine them, but the outputs rarely emerge with all their limbs intact, figuratively speaking.

OOTDiffusion addresses the two big challenges in this area: making sure the outputs look natural and realistic, and preserving as much of the garment’s appearance and qualities as possible in the process.

It seems to to a very good job, and you can try it for yourself in the online demo. Check out the research paper for more details, and the GitHub repository provides all the code if you’d like to get a little more hands-on.

A Badge For AI-Free Content – 100% Human!

These days, just about anyone with a pulse can fall on a keyboard and make an AI image generator spurt out some kind of vaguely visual content. A lot of it is crap. Some of it’s confusing. But most of all, creators hate it when their hand-crafted works are compared with these digital extrusions from mathematical slop. Enter the “not by AI” badge.

Screenshot from https://notbyai.fyi/business

Basically, it’s exactly what it sounds like. A sleek, modern badge that you slap on your artwork to tell people that you did this, not an AI. There are pre-baked versions for writers (“written by human”), visual artists (“painted by human”), and musicians (“produced by human”). The idea is that these badges would help people identify human-generated content and steer away from AI content if they’re trying to avoid it.

It’s not just intended to be added to individual artworks. Websites that have “at least 90%” of content created by humans are invited to host the badge, along with apps, too. This directive reveals an immediate flaw—the badge would easily confuse someone if they read the 10% of content by AI on a site wearing the badge. There’s also nothing stopping people from slapping the badge on AI-generated content and simply lying to people.

You might take a more cynical view if you dig deeper, though. The company is charging for various things, such as a monthly fee for businesses that want to display the badges.

We’ve talked about this before when we asked a simple question—how do you convince people your artwork was made by a human? We’re not sure we’ve yet found the answer, but this badge program is at least trying to do something about the issue. Share your human thoughts in the comments below.

Using Local AI On The Command Line To Rename Images (And More)

We all have a folder full of images whose filenames resemble line noise. How about renaming those images with the help of a local LLM (large language model) executable on the command line? All that and more is showcased on [Justine Tunney]’s bash one-liners for LLMs, a showcase aimed at giving folks ideas and guidance on using a local (and private) LLM to do actual, useful work.

This is built out from the recent llamafile project, which turns LLMs into single-file executables. This not only makes them more portable and easier to distribute, but the executables are perfectly capable of being called from the command line and sending to standard output like any other UNIX tool. It’s simpler to version control the embedded LLM weights (and therefore their behavior) when it’s all part of the same file as well.

One such tool (the multi-modal LLaVA) is capable of interpreting image content. As an example, we can point it to a local image of the Jolly Wrencher logo using the following command:

llava-v1.5-7b-q4-main.llamafile --image logo.jpg --temp 0 -e -p '### User: The image has...\n### Assistant:'

Which produces the following response:

The image has a black background with a white skull and crossbones symbol.

With a different prompt (“What do you see?” instead of “The image has…”) the LLM even picks out the wrenches, but one can already see that the right pieces exist to do some useful work.

Check out [Justine]’s rename-pictures.sh script, which cleverly evaluates image filenames. If an image’s given filename already looks like readable English (also a job for a local LLM) the image is left alone. Otherwise, the picture is fed to an LLM whose output guides the generation of a new short and descriptive English filename in lowercase, with underscores for spaces.

What about the fact that LLM output isn’t entirely predictable? That’s easy to deal with. [Justine] suggests always calling these tools with the --temp 0 parameter. Setting the temperature to zero makes the model deterministic, ensuring that a same input always yields the same output.

There’s more neat examples on the Bash One-Liners for LLMs that demonstrate different ways to use a local LLM that lives in a single-file executable, so be sure to give it a look and see if you get any new ideas. After all, we have previously shown how automating tasks is almost always worth the time invested.

Generating 3D Scenes From Just One Image

The LucidDreamer project ties a variety of functions into a pipeline that can take a source image (or generate one from a text prompt) and “lift” its content into 3D, creating highly-detailed Gaussian splats that look great and can even be navigated.

Gaussian splatting is a method used to render NeRFs (Neural Radiance Fields), which are themselves a method of generating complex scenes from sparse 2D sources, and doing it quickly. If that is all news to you, that’s probably because this stuff has sprung up with dizzying speed from when the original NeRF concept was thought up barely a handful of years ago.

What makes LucidDreamer neat is the fact that it does so much with so little. The project page has interactive scenes to explore, but there is also a demo for those who would like to try generating scenes from scratch (some familiarity with the basic tools is expected, however.)

In addition to the source code itself the research paper is available for those with a hunger for the details. Read it quick, because at the pace this stuff is expanding, it honestly might be obsolete if you wait too long.

Multi-View Wire Art Meets Generative AI

DreamWire is a system for generating multi-view wire art using machine learning techniques to help generate the patterns required.

The 3-dimensional wire pattern in the center creates images of Einstein, Turing, and Newton depending on viewing angle.

What’s wire art? It’s a three-dimensional twisted mass of lines which, when viewed from a certain perspective, yields an image. Multi-view wire art produces different images from the same mass depending on the viewing angle, and as one can imagine, such things get very complex, very quickly.

A recently-released paper explains how the system works, explaining the role generative AI plays in being uniquely suited to create meaningful intersections between multiple inputs. There’s also a video (embedded just under the page break) that showcases many of the results researchers obtained.

The GitHub repository for the project doesn’t have much in it yet, but it’s a good place to keep an eye on if you’re interested in what comes next.

We’ve seen generative AI applied in a similarly novel way to help create visual anagrams, or 2D patterns that can be interpreted differently based on a variety of orientations and permutations. These sorts of systems still need to be guided by a human, but having machine learning do the heavy lifting allows just about anybody to explore their creativity.

Continue reading “Multi-View Wire Art Meets Generative AI”

Explore Neural Radiance Fields In Real-time, Even On A Phone

Neural Radiance Fields (NeRF) is a method of reconstructing complex 3D scenes from sparse 2D inputs, and the field has been growing by leaps and bounds. Viewing a reconstructed scene is still nontrivial, but there’s a new innovation on the block: SMERF is a browser-based method of enabling full 3D navigation of even large scenes, efficient enough to render in real time on phones and laptops.

Don’t miss the gallery of demos which will run on anything from powerful desktops to smartphones. Notable is the distinct lack of blurry, cloudy, or distorted areas which tend to appear in under-observed areas of a NeRF scene (such as indoor corners and ceilings). The technical paper explains SMERF’s approach in more detail.

NeRFs as a concept first hit the scene in 2020 and the rate of advancement has been simply astounding, especially compared to demos from just last year. Watch the short video summarizing SMERF below, and marvel at how it compares to other methods, some of which are themselves only months old.

Continue reading “Explore Neural Radiance Fields In Real-time, Even On A Phone”

Double-Dose Of AI Turns Daily Tasks Into Works Of Art

Not so long ago, “Magic Mirror” builds were all the rage, and we have to admit getting out daily reminders and newsfeeds on an LCD display sitting behind a partially reflective mirror is not without its charms. But styles ebb and flow, so we don’t see too many of those builds anymore. This e-ink daily calendar reminder hearkens back to those Magic Mirrors, only with a double twist of AI.

This project is the work of [Ilkka Turunen], and right up front we’ll say the results are just gorgeous. A lot of that has to do with the 10.3″ e-ink display used, but more with the creative use of not one but two machine learning systems. The first is ChatGPT, which [Ilkka] uses to parse the day’s online calendar entries and grab the most significant events to generate a prompt for DALL-E. The generated DALL-E prompt has specific instructions that guide the style of the image, which honestly is where most of the artistry lies. [Ilkka]’s aesthetic choices, like suggesting that the images look like a 19th-century lithograph or a satirical comic from a turn-of-the-(last)-century newspaper. The prompt is then sent off to DALL-E for rendering, and the resulting image is displayed.

It has to be said that the prompts that ChatGPT generates based on the combination of [Ilkka]’s aesthetic preferences and the random events of the day are strikingly complex. The chatbot really seems to be showing some imagination these days; DALL-E is no slouch either in turning those words into images.

Like the idea of an e-ink daily reminder but prefer a less artistic presentation? This should help.

Continue reading “Double-Dose Of AI Turns Daily Tasks Into Works Of Art”