Tech artist [Alexander Reben] has shared some work in progress with us. It’s a neural network trained on various famous peoples’ speech (YouTube, embedded below). [Alexander]’s artistic goal is to capture the “soul” of a person’s voice, in much the same way as death masks of centuries past. Of course, listening to [Alexander]’s Rob Boss is no substitute for actually watching an old Bob Ross tape — indeed it never even manages to say “happy little trees” — but it is certainly recognizable as the man himself, and now we can generate an infinite amount of his patter.

Behind the scenes, he’s using WaveNet to train the networks. Basically, the algorithm splits up an audio stream into chunks and tries to predict the next chunk based on the previous state. Some pre-editing of the training audio data was necessary — removing the laughter and applause from the Colbert track for instance — but it was basically just plugged right in.

The network seems to over-emphasize sibilants; we’ve never heard Barack Obama hiss quite like that in real life. Feeding noise into machines that are set up as pattern-recognizers tends to push them to the limits. But in keeping with the name of this series of projects, the “unreasonable humanity of algorithms”, it does pretty well.

He’s also done the same thing with multiple speakers (also YouTube), in this case 110 people with different genders and accents. The variation across people leads to a smoother, more human sound, but it’s also not clearly anyone in particular. It’s meant to be continuously running out of a speaker inside a sculpture’s mouth. We’re a bit creeped out, in a good way.

We’ve covered some of [Alexander]’s work before, from the wince-inducing “Robot Bites Man” to the intellectual-conceptual “All Prior Art“. Keep it coming, [Alexander]!

  1. Recently they came up with a way to adjust video of people speaking so that a 3rd person could ‘move their mouth’ (look at the video and you’ll understand:

    The output of this article of voice emulation may now sound creepy and artificial, but once the neural networks get more training data, one day they will produce voices that are indistinguishable from reality.

    I bet that in “certain countries” suddenly foreign heads of state will start saying really interesting things..

    1. I wonder will royalties go to dead actors, currently living family, who are recycled by a merger of the two technologies. I’m sure SAG-AFTRA (Screen Actors Guild‐American Federation of Television and Radio Artists) will block it until the royalties are sorted out.

    2. Now that was impressive – I can see it being really good on films that need a re-dub for voices (say for foreign language, script changes or even, er, ‘PG-13′ classifications reasons). That way one wouldn’t get quite so distracted by the visuals not matching the audio and forcing them to pick bad re-dub words (see all the oddball variants with Bruce Willis’ infamous “yippee-ki-yay..” line in DieHard)

  2. Pretty much sounds like sweeping the shortwave dial during high magnetosphere activity, hmmm that might be voice of america, that might be BBC world service, that might be something french…

  4. Rob Boss whisper is what you hear coming from the shadowy corners of your cabin in the woods when you think you are alone and slowly going insane. This is both creepy and hilarious. Imagining that one coming out of a bust of Bob Ross now continuously and its freaking me the hell out

