Creepy Speaking Neural Networks

March 22, 2017

Tech artist [Alexander Reben] has shared some work in progress with us. It’s a neural network trained on various famous peoples’ speech (YouTube, embedded below). [Alexander]’s artistic goal is to capture the “soul” of a person’s voice, in much the same way as death masks of centuries past. Of course, listening to [Alexander]’s Rob Boss is no substitute for actually watching an old Bob Ross tape — indeed it never even manages to say “happy little trees” — but it is certainly recognizable as the man himself, and now we can generate an infinite amount of his patter.

Behind the scenes, he’s using WaveNet to train the networks. Basically, the algorithm splits up an audio stream into chunks and tries to predict the next chunk based on the previous state. Some pre-editing of the training audio data was necessary — removing the laughter and applause from the Colbert track for instance — but it was basically just plugged right in.

The network seems to over-emphasize sibilants; we’ve never heard Barack Obama hiss quite like that in real life. Feeding noise into machines that are set up as pattern-recognizers tends to push them to the limits. But in keeping with the name of this series of projects, the “unreasonable humanity of algorithms”, it does pretty well.

He’s also done the same thing with multiple speakers (also YouTube), in this case 110 people with different genders and accents. The variation across people leads to a smoother, more human sound, but it’s also not clearly anyone in particular. It’s meant to be continuously running out of a speaker inside a sculpture’s mouth. We’re a bit creeped out, in a good way.

We’ve covered some of [Alexander]’s work before, from the wince-inducing “Robot Bites Man” to the intellectual-conceptual “All Prior Art“. Keep it coming, [Alexander]!

37 thoughts on “Creepy Speaking Neural Networks”

--A says:

March 22, 2017 at 4:42 am

Elliot, seems a [Name] failure occurred the last PP.
–A

Reply
1. Mike Szczys says:
  
  March 22, 2017 at 7:16 am
  
  Quite right, should be Alexander. I’ve made the changes, thanks!
  
  Reply
wartoaster says:

March 22, 2017 at 5:00 am

Colbear

Also, I’m glad they upgraded the voice effects for the new season of Twin Peaks

Reply
1. Mike Szczys says:
  
  March 22, 2017 at 7:24 am
  
  Ha! That reference takes me back. I’m surprised I didn’t hear it the first time I watched the demo.
  
  Reply
2. dizot says:
  
  March 22, 2017 at 9:48 pm
  
  I knew that I heard “this is a Formica table” during the Rob Boss whisper segment.
  
  Reply
mime says:

March 22, 2017 at 5:16 am

Recently they came up with a way to adjust video of people speaking so that a 3rd person could ‘move their mouth’ (look at the video and you’ll understand: https://youtu.be/ohmajJTcpNk

The output of this article of voice emulation may now sound creepy and artificial, but once the neural networks get more training data, one day they will produce voices that are indistinguishable from reality.

I bet that in “certain countries” suddenly foreign heads of state will start saying really interesting things..

Reply
1. Truth says:
  
  March 22, 2017 at 5:48 am
  
  I wonder will royalties go to dead actors, currently living family, who are recycled by a merger of the two technologies. I’m sure SAG-AFTRA (Screen Actors Guild‐American Federation of Television and Radio Artists) will block it until the royalties are sorted out.
  
  Reply
  1. Echo_Hotel (@Echo_Hotel) says:
    
    March 22, 2017 at 6:03 pm
    
    Presumably these if going to be something that the actor owns rather than the studio with actors who want the cutting room floor scraps for training their doppelganger, taking home slightly less in the short term for the promise of eternal CG youth.
    
    Reply
2. Sheldon says:
  
  March 22, 2017 at 6:24 am
  
  Now that was impressive – I can see it being really good on films that need a re-dub for voices (say for foreign language, script changes or even, er, ‘PG-13′ classifications reasons). That way one wouldn’t get quite so distracted by the visuals not matching the audio and forcing them to pick bad re-dub words (see all the oddball variants with Bruce Willis’ infamous “yippee-ki-yay..” line in DieHard)
  
  Reply
  1. Mike says:
    
    March 22, 2017 at 9:51 pm
    
    And…
    dubbing in different languages in the actors “real” voice for foreign distribution
    
    Reply
CRJEEA says:

March 22, 2017 at 6:08 am

Next stop, Max Headroom.

Reply
Mike says:

March 22, 2017 at 6:20 am

who is Obamer?

Reply
1. guest says:
  
  March 22, 2017 at 7:16 am
  
  The Russian sounding one.
  
  Reply
RW ver 0.0.2 says:

March 22, 2017 at 6:21 am

Pretty much sounds like sweeping the shortwave dial during high magnetosphere activity, hmmm that might be voice of america, that might be BBC world service, that might be something french…

Reply
rasz_pl says:

March 22, 2017 at 6:31 am

garbage in, garbage out

Reply
markscudder says:

March 22, 2017 at 6:43 am

You’ve never heard Barack Obama hiss like that, huh?

Besides the obvious joke, that bigot has one of the most pronounced sibilance problems of any modern American celebrity. Only Paul Harvey had it worse. Perhaps we rationalize away the faults of those we adore. It was unsurprising to me the neural net picked it up.

Reply
1. sneftel says:
  
  March 22, 2017 at 8:29 am
  
  yaaaaay let’s bring your political views into a blog post on neural networks
  
  Reply
  1. Dave Davidson says:
    
    March 22, 2017 at 9:34 am
    
    Train a Neural net with Political decisions and outcomes world wide for the last 2000 years and see what it produces.
    
    Reply
    1. sneftel says:
      
      March 22, 2017 at 10:23 am
      
      cerberarchy
      
      Reply
  2. Valentin says:
    
    March 22, 2017 at 1:44 pm
    
    Attempting to derail a point with an emotionally triggered response. That is all.
    
    This is how religious figures justify their heinous teachings. I,e. “omgg
    think of the CHILDREN and their precious little Sunday school.
    
    Reply
    1. Quin says:
      
      March 23, 2017 at 6:54 pm
      
      Look, a pure example of conservative virtue signaling. I bet the precious snowflake thinks that’s something only “those nasty lie-beral sjw” do.
      
      Reply
2. Dan says:
  
  March 22, 2017 at 4:06 pm
  
  http://arcturi.com/sitebuilder/images/Obama_Reptilian-270×270.jpg
  
  Reply
drew says:

March 22, 2017 at 8:09 am

Rob Boss whisper is what you hear coming from the shadowy corners of your cabin in the woods when you think you are alone and slowly going insane. This is both creepy and hilarious. Imagining that one coming out of a bust of Bob Ross now continuously and its freaking me the hell out

Reply
1. thatfatninja says:
  
  March 22, 2017 at 9:19 am
  
  The random whispers was one thing, the whispering along with the dog mountain google dream interpretation put it over the top.
  
  Reply
  1. thatfatninja says:
    
    March 22, 2017 at 9:20 am
    
    were^^
    
    Reply
2. chango says:
  
  March 22, 2017 at 9:33 am
  
  Make it animatronic and you’re guaranteed to win the HaD prize.
  
  Reply
notarealemail says:

March 22, 2017 at 9:53 am

Rob Boss is reminiscent of garbled cell phone noise.

Reply
Joe says:

March 22, 2017 at 10:03 am

Just sounds like an old cassette played backwards. If I wasn’t too lazy I’d reverse it just to make sure.

Reply
Keith says:

March 22, 2017 at 10:12 am

A new weapon in our war against automated robocallers!

Reply
localhost says:

March 22, 2017 at 11:09 am

Train this with Rick Astley and Justin Bieber. But don’t listen to the output if you do that.

Reply
echodelta says:

March 22, 2017 at 12:28 pm

A few decades ago I saw it coming, Howard Cosell or Walter Cronkite reading to you the news. Sadly like TV-movies-CGI it will be the reason for turning it all off.
Like laptop Data Jockeys making “music” of someones work into just worthless noise, end run. Power off.

Reply
mabarnett0 says:

March 22, 2017 at 12:36 pm

Not quite unsettling enough. I know! Lets train it on anime characters.
https://youtu.be/FsVSZpoUdSU

Reply
1. Dan says:
  
  March 22, 2017 at 4:08 pm
  
  Ah that is the video I wanted to point out, much better, and informative.
  
  And without creepy subliminal “lizard people” references.
  
  Reply
2. Crazy Cat Man says:
  
  March 23, 2017 at 11:39 am
  
  9000 is pretty funny: “AAAAAAAAH! AAAAAAAAAAAAAAH! AAAH! AAAAAAAAAAAAAAAAAAAAAH!
  
  Reply
gregkennedy says:

March 22, 2017 at 6:42 pm

Is this really an NN? It seems more like Markov chain using sound instead of text.

Reply
Tinker Duck says:

March 23, 2017 at 1:58 pm

Never thought about applying the term uncanny valley to audio, until now…

Reply
Aprilia says:

April 3, 2018 at 1:29 am

thats good, how you can do it?

Reply

Hackaday

Creepy Speaking Neural Networks

37 thoughts on “Creepy Speaking Neural Networks”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

From Zip To Nought: The Rise And Fall Of Iomega

Artemis II Agenda Keeps Moon-Bound Crew Busy

The Rise And Fall Of Free Dial Up Internet

Hacking The System In A Moral Panic: We Need To Talk

Fictional Moon: Reality TV And SciFi Don’t Mix

Our Columns

Age-Verification And The World Before Social Media

Hackaday Links: March 22, 2026

The Unreasonable Power Density Of Lithium-Ion

Hackaday Podcast Episode 362: Compression Molding, IPv4x, And Wired Headphones

This Week In Security: Linux Flaws, Python Ownage, And A Botnet Shutdown

37 thoughts on “Creepy Speaking Neural Networks”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns