AI Image Generator Twists In Response To MIDI Dials, In Real-time

July 30, 2024

MIDI isn’t just about music, as [Johannes Stelzer] shows by using dials to adjust AI-generated imagery in real-time. The results are wild, with an interactivity to them that we don’t normally see in such things.

[Johannes] uses Stable Diffusion‘s SDXL Turbo to create a baseline image of “photo of a red brick house, blue sky”. The hardware dials act as manual controls for applying different embeddings to this baseline, such as “coral”, “moss”, “fire”, “ice”, “sand”, “rusty steel” and “cookie”.

By adjusting the dials, those embeddings are applied to the base image in varying strengths. The results are generated on the fly and are pretty neat to see, especially since there is no appreciable amount of processing time required.

The MIDI controller is integrated with the help of lunar_tools, a software toolkit on GitHub to facilitate creating interactive exhibits. As for the image end of things, we’ve previously covered how AI image generators work.

29 thoughts on “AI Image Generator Twists In Response To MIDI Dials, In Real-time”

Cad the Mad says:

July 30, 2024 at 10:13 pm

In before people lose it over AI image generation.

Very cool project. Visually fascinating and highly polished with the minimal processing time.

Report comment

Reply
1. Zee says:
  
  July 31, 2024 at 5:32 am
  
  This is a hacker crowd. I’d expect they support genAI.
  
  Report comment
  
  Reply
  1. Joseph Eoff says:
    
    August 2, 2024 at 4:02 am
    
    Nope, no support for genAI. genAI just produces crap.
    
    Report comment
    
    Reply
    1. Ayan says:
      
      August 5, 2024 at 11:54 am
      
      I agree. genAI steals artists’ works without permission.
      
      Report comment
      
      Reply
2. Edward says:
  
  August 1, 2024 at 5:17 pm
  
  They have no one to blame but themselves. When the Devs BLATANTLY lie about how the models are trained, what they use to train them, keeping lists of artists and entire catalogs of works. THEN Discord leaks pretty much prove the entire team lied through their teeth and KNEW it was lies. I’ve got zero sympathy for them. They should have been transparent and take whatever regulations may come, but they didn’t.
  
  Report comment
  
  Reply
  1. shod says:
    
    August 2, 2024 at 3:20 am
    
    Real developers lie to the larger dumb audience and politicians and money grabbers, it’s the only thing to do really.
    
    Report comment
    
    Reply
  2. shod says:
    
    August 2, 2024 at 3:25 am
    
    There are many ready-made MIDI controllers with dials available.
    So this ‘only way he knew’ snark seems a bit out of place.
    
    Report comment
    
    Reply
  3. shod says:
    
    August 2, 2024 at 3:26 am
    
    FUCKING HELL why don’t my replies go where they should??
    Drives me nuts, excuse me.
    
    Report comment
    
    Reply
ben says:

July 30, 2024 at 10:47 pm

A little more responsiveness and it has good potential as an eq visualizer plugin.

Report comment

Reply
1. Foxhood says:
  
  July 31, 2024 at 2:06 am
  
  Would be a cool concept, though i don’t think one’s GPU turning into a jet-engine as it goes into overdrive to keep generating makes it a practical one.
  
  Report comment
  
  Reply
Jouni says:

July 30, 2024 at 10:51 pm

The demo-video missed the most juicy opportunity: using multiple dials at the same time.

Report comment

Reply
1. TG says:
  
  July 31, 2024 at 11:15 am
  
  Yeah I was thinking about that too. I wonder if the outputs were pre-baked to make it more responsive, which would make it prohibitive to bake every possible combination of dial positions. Or maybe they just forgot to film the coolest part, this happens more often than you’d think
  
  Report comment
  
  Reply
shinsukke says:

July 31, 2024 at 1:52 am

How is it so responsive? My images with stable-diffusion take like 10 seconds at least to make a 512×512 image

Report comment

Reply
1. Foxhood says:
  
  July 31, 2024 at 2:03 am
  
  They used a “distilled” Model. Which is like a Model that tries to condense what a set of bigger models does while being smaller and faster. Essentially it is a Model of other Models. Its as crazy as it sounds.
  
  It can enable for stupidly fast generation like this one does it within a single step, but as seen in the demo: the accuracy and ability to deviate takes a nose-dive. Rendering it more of something for experimental showcases like this.
  
  Report comment
  
  Reply
2. Erik says:
  
  August 4, 2024 at 12:40 pm
  
  SDXL Turbo (as mentioned in the article) can render an output in one or two steps. I can generate a 512×512 image on my 4080 Super in about 300 ms.
  
  Report comment
  
  Reply
Foxhood says:

July 31, 2024 at 2:16 am

Can’t deny that Generative AI used for such interactive show-cases shows potential.

Just wish it didn’t involve Stability AI. They are definitely one of the more ethically dubious of the bunch.

Report comment

Reply
1. TG says:
  
  July 31, 2024 at 11:17 am
  
  This makes me like it a lot more
  
  Report comment
  
  Reply
2. Bryce Schroeder says:
  
  July 31, 2024 at 2:13 pm
  
  Why do you say SAI is more ethically dubious? All of the major ones are trained on scraped data, at least SAI isn’t then putting the result in a corporate walled garden.
  
  Report comment
  
  Reply
Bryce Schroeder says:

July 31, 2024 at 5:52 am

Very cool. I’m glad they used a self-hostable image generator and not some API a corporation could take away on a whim. I did something similar (without a physical interface) for a puzzle in a table-top RPG – the device had dials corresponding to the classical elements that changed the overall environment, and switches to toggle on or off specific elements. The players had to use it to match descriptions from an NPC’s journal.

Report comment

Reply
1. Cheese Whiz says:
  
  July 31, 2024 at 6:48 am
  
  Dang, that’s cool! Any plans for a project writeup?
  
  Report comment
  
  Reply
Reactive Light says:

July 31, 2024 at 9:40 am

Now, apply it to a photo of a face, with variable like “ear size” and “hair color”. We’ve long seen this with selections from discrete images, but it would be a lot more fun with continuous variation.

Report comment

Reply
1. TG says:
  
  July 31, 2024 at 11:18 am
  
  Or to use it on a piece of writing like a short story or a poem. That would be interesting
  
  Report comment
  
  Reply
Tony M says:

July 31, 2024 at 12:40 pm

Sweet dreams are made of this…[insert sound track here]

Report comment

Reply
Jan says:

July 31, 2024 at 1:32 pm

What does MIDI got to do with this other than the only way he knew to read dials?

Report comment

Reply
1. The Commenter Formerly Known As Ren says:
  
  July 31, 2024 at 2:32 pm
  
  “If all you have is a hammer…”
  
  Report comment
  
  Reply
2. hugh crawford says:
  
  August 2, 2024 at 8:26 pm
  
  Probably the developer chose MIDI because you can buy a box with a bunch of knobs on it and a CPU that encodes them and sends messages to an interface really cheap if you use midi as the interface. Otherwise you have to build your own and it’s expensive and takes a long time and the developer wasn’t interested in hardware.anyway.
  
  Report comment
  
  Reply
M3K says:

August 1, 2024 at 6:47 am

This reminds me so much of a video I saw of a talk called “Inventing on Principle” by Brett Victor. As a means of illustrating his point, he talks about his own guiding principle of immediate feedback in creative endeavors. Sure, the relationship between AI/ML/what-have-you and human creativity is absolute flame war fodder, but I think this is a fantastic ‘fuzzy’ way to interact with the ‘fuzzy’ black-box logic of AI image generation engines. Also, would 100% recommend looking up the above video on YouTube. Well worth your 55 minutes.

Report comment

Reply
steve says:

August 1, 2024 at 11:16 am

I’m curious if it can sidestep prompts and just remix images without external learning data.

Report comment

Reply
Michael says:

August 1, 2024 at 5:50 pm

Interesting, but what is the use case?

Report comment

Reply