WhisperFrame Depicts The Art Of Conversation

At this point, you gotta figure that you’re at least being listened to almost everywhere you go, whether it be a home assistant or your very own phone. So why not roll with the punches and turn lemons into something like a still life of lemons that’s a bit wonky? What we mean is, why not take our conversations and use AI to turn them into art? That’s the idea behind this next-generation digital photo frame created by [TheMorehavoc].
Essentially, it uses a Raspberry Pi and a Respeaker four-mic array to listen to conversations in the room. It listens and records 15-20 seconds of audio, and sends that to the OpenWhisper API to generate a transcript.
This repeats until five minutes of audio is collected, then the entire transcript is sent through GPT-4 to extract an image prompt from a single topic in the conversation. Then, that prompt is shipped off to Stable Diffusion to get an image to be displayed on the screen. As you can imagine, the images generated run the gamut from really weird to really awesome.

The natural lulls in conversation presented a bit of a problem in that the transcription was still generating during silences, presumably because of ambient noise. The answer was in voice activity detection software that gives a probability that a voice is present.

Naturally, people were curious about the prompts for the images, so [TheMorehavoc] made a little gallery sign with a MagTag that uses Adafruit.io as the MQTT broker. Build video is up after the break, and you can check out the images here (warning, some are NSFW).

Continue reading “WhisperFrame Depicts The Art Of Conversation”

AI Image Generation Gets A Drag Interface

AI image generators have gained new tools and techniques for not just creating pictures, but modifying them in consistent and sensible ways, and it seems that every week brings a fascinating new development in this area. One of the latest is Drag Your GAN, presented at SIGGRAPH 2023, and it’s pretty wild.

It provides a point-dragging interface that modifies images based on their implied structure. A picture is worth a thousand words, so this short animation shows what that means. There are plenty more where that came from at the project’s site, so take a few minutes to check it out.

GAN stands for generative adversarial network, a class of machine learning that features prominently in software like image generation; the “adversarial” part comes from the concept of networks pulling results between different goalposts. Drag Your GAN has a GitHub repository where code is expected to be released in June, but in the meantime, you can read the full paper or brush up on the basics of how AI image generators work, as well as see how image generation can be significantly enhanced with an understanding of a 2D image’s implied depth.

3D Printed Triptych Shows Trio Of AI-Generated Images

Fascinated by art generated by deep learning systems such as DALL-E and Stable Diffusion? Then perhaps a wall installation like this phenomenal e-paper Triptych created by [Zach Archer] is in your future.

The three interlocking frames were printed out of “Walnut Wood” HTPLA from ProtoPasta, and hold a pair of 5.79 inch red/black/white displays along with a single 7.3 inch red/yellow/black/white panel from Waveshare. There are e-paper panels out there with more colors available if you wanted to go that route, but judging by the striking images [Zach] has posted, the relatively limited color palettes available on these displays doesn’t seem to be a hindrance.

Note the clever S-shaped brackets holding in the displays.

To create the images themselves, [Zach] wrote a script that would generate endless customized portraits using Stable Diffusion v1.4, and then manually selected the best to get copied over to a 32 GB micro SD card. The side images were generated on the dreamstudio.ai website, and also dumped on the card.

Every 12 hours a TinyPico ESP32 development board in the frame picks some images from the card, applies the necessary dithering and color adjustments to make them look good on the e-paper, and then updates the displays. Continue reading “3D Printed Triptych Shows Trio Of AI-Generated Images”

Giving Stable Diffusion Some Depth

You’ve likely heard quite a bit of buzz over the last few months about Stable Diffusion. The new version (v2) has come out, and in addition to the standard image-to-image and text-to-image modes, it also has a depth-image-to-image that can be incredibly useful. [Andrew] has a write-up that guides you on using this mode.

The basic idea is that you can take both an image and depth into the model, which allows you to control what gets put where. Stable Diffusion is a bit confusing, but we already have some great resources to wrap your head around it. In terms of input, you can use a depth map from a camera with lidar (many recent phones include this) or have another model (like MiDaS) estimate it from a 2D picture. This becomes powerful when you can preserve a specific composition, such as an iconic scene from a well-known movie. You can keep the characters’ poses on the screen but transform the style of the scene into whatever you wish (as seen above).

We have already covered a technique to generate textures right in blender, but this new depth information has already been implemented to provide better accuracy of the textures.

[Justin Alvey] used it to create architectural photos from dollhouse furniture. Using the MiDaS model, he estimated the depth and threw away the RGB aspects by setting the denoising strength to maximum. The simplified dollhouse furniture was easily recognizable to the model, which helped produce great results.

However, the only downside is that the perspective produces a rather dollhouse feel. Changing the focal length and moving farther away helps. Overall, it’s a clever use of what the new AI model can do. It’s a fast-moving space, so this will likely be out of date in a few months.