WhisperFrame Depicts The Art Of Conversation

September 22, 2023

At this point, you gotta figure that you’re at least being listened to almost everywhere you go, whether it be a home assistant or your very own phone. So why not roll with the punches and turn lemons into something like a still life of lemons that’s a bit wonky? What we mean is, why not take our conversations and use AI to turn them into art? That’s the idea behind this next-generation digital photo frame created by [TheMorehavoc].

Essentially, it uses a Raspberry Pi and a Respeaker four-mic array to listen to conversations in the room. It listens and records 15-20 seconds of audio, and sends that to the OpenWhisper API to generate a transcript.

This repeats until five minutes of audio is collected, then the entire transcript is sent through GPT-4 to extract an image prompt from a single topic in the conversation. Then, that prompt is shipped off to Stable Diffusion to get an image to be displayed on the screen. As you can imagine, the images generated run the gamut from really weird to really awesome.

The natural lulls in conversation presented a bit of a problem in that the transcription was still generating during silences, presumably because of ambient noise. The answer was in voice activity detection software that gives a probability that a voice is present.

Naturally, people were curious about the prompts for the images, so [TheMorehavoc] made a little gallery sign with a MagTag that uses Adafruit.io as the MQTT broker. Build video is up after the break, and you can check out the images here (warning, some are NSFW).

12 thoughts on “WhisperFrame Depicts The Art Of Conversation”

1984George says:

September 22, 2023 at 4:36 am

wow.
What will sensors and AI lead to next?

Report comment

Reply
come3 says:

September 22, 2023 at 4:38 am

“The walls have ears.”

That’s for sure.

Report comment

Reply
The Commenter Formerly Known As Ren says:

September 22, 2023 at 4:40 am

I find it amazing how the “machine” gets some things done well, but totally blows it on others.

Report comment

Reply
1. Gravis says:
  
  September 22, 2023 at 5:45 am
  
  It all depends on how the input images are labeled. Applied concepts are what make AI blow it because because they can have so many varying depictions. You need to be a literal as possible.
  
  Report comment
  
  Reply
2. Truth says:
  
  September 22, 2023 at 7:03 am
  
  It is like all things in life, if you master something and you are asked to work on something at the very edge of your knowledge where you have had little experience, you may totally suck at that. The same is true for AI, but unlike a human, AI does not “know” that they suck, they will just try and do what they were asked to do no matter how little training data was used in that area.
  
  Report comment
  
  Reply
  1. David says:
    
    September 22, 2023 at 11:31 am
    
    Exactly that. I play around with Easy Diffusion (a canned version of Stable Diffusion) on my computer and it can be amusing to see what it comes up with when I feed it detailed text prompts for HOW something should look but leave out WHAT it should be … or I make the subject really ambiguous. The AI plods ahead anyway. I’ve gotten some really pretty images that way, some of which are really bizarre.
    
    Report comment
    
    Reply
come3 says:

September 22, 2023 at 4:48 am

“whisper frame” … it could be better named. Whisper frame does not truly capture its essence.
More like:
“ear art” “ listen art” “talk art”
“wall-E-ars”
“listen paintings” “talk paintings”
even “sound paintings” if you tuned the prompt generator to respond to music genres instead of words.

Report comment

Reply
macsimski says:

September 22, 2023 at 5:04 am

Nsfw? Are you from iran? But they are nice lucid dreams and night mares indeed..

Report comment

Reply
a_do_z says:

September 22, 2023 at 9:18 am

Hmm…live chicken vs. raw chicken, chicken sitting near a can of “Raw” beer, bustling kitchen with no apparent activity… doesn’t really seem at all semantically sensitive or intelligent.

I’m not quite sold on AI. That is, other than the fact that it can come up with anything at all.

Report comment

Reply
echodelta says:

September 22, 2023 at 9:28 am

We used to do this in the late 60’s with orange barrel and purple micro dot. Shared visions were experienced. The fathers of electronic music the Berlin School, Tangerine Dream named suitably.
In the next generation will they will be on a “trip”, VR avatars games and all. But whose trip, that’s the future.

Report comment

Reply
illusionmanager says:

September 24, 2023 at 10:33 am

love this!

Report comment

Reply
Jim Schrempp says:

October 30, 2023 at 12:59 pm

I love this!! You inspired me to make my own and my friends and I are having a great time with it. I put my code on GitHub so anyone can try it. Laptop or Raspberry Pi. https://www.jimschrempp.com/features/computer/speech_to_picture.htm

Report comment

Reply