Vocoding with a piano

0210_klavier1_n (Custom)

This really cool project allows a grand piano to “speak”.  We don’t know any details about its construction but we had to share. The keys are being hit by solenoids in a manner to replicate human speech.  Click through to the video, it’s worth it. You may have to allow the popup to see the video, and it is in german, but the piano is clearly speaking english. We want one to keep around the office. It could read our emails to us.

[via matrixsynth]

Comments

  1. Ben Ryves says:

    I think the link should be http://www.3sat.de/kulturzeit/tips/138237/index.html ?
    Very interesting stuff, though I don’t think I’d have understood what it was “saying” without the subtitles..!

  2. arthur92710 says:

    That link does not work. Use the one in comment 1
    That sounds wired. They should do the reverse. Play a song and translate it into text. Curious about the results.

  3. nate says:
  4. Alan says:

    Actually unfortunately it is replicating an audio file of the document read by a kid. Think of it as making a speaker out of a piano.

  5. Stromlo says:

    Awesome, but scary!

  6. Hackius says:

    They should have gotten someone without a thick accent.

  7. heegemcgee says:

    @alan:
    Isn’t that splitting hairs? Are you any less amazed?

  8. pony says:

    @ alan
    Not quite. It’s almost as though they are doing Fourier synthesis by combining keystrokes instead of sine waves.

  9. Kyle says:

    Nice! As pony said, considering that it’s a percussive instrument and not just a sine wave, that’s quite a feat!

  10. Bryan says:

    For some reason, that piano creeps me the hell out…..

  11. Would be awesome to have a whole band of instruments talking to each other.

  12. Ian says:

    @Bryan
    I agree. It sounds creepy as hell.

    If I had the time and money I’d pop the solenoids directly on the strings and go for more of a Kraftwerk “Man-Machine” sound. Maybe modulate the carrier by adjusting the dampening on the string.

  13. Ray says:

    @Alan
    The piano makes the noise, yes like a speaker. What else exactly should it do? Not sure why it matters if the audio is coming from a recording or a voice synth program?

  14. will d. says:

    if that’s really just replicating an audio file, i wonder how it would sound replicating other instruments, or even a whole orchestra. this seems like something that could be done easily with a software piano synth.

  15. Ray says:

    And by like a speaker I mean in the abstract sense that its producing sound, regardless of the method.

  16. Josh says:

    This would make one hell of a haunted house prop. Simply amazing.

  17. nate says:

    I’m not sure how he’s doing it, but if it really is just converting frequency ranges to keystrokes, it should be dead simple to write a program to convert audio files to MIDI. I might play around with the idea some when I get the time.

  18. sly says:

    GLaDOS cometh

  19. spacecoyote says:

    There’s a shareware program called TS Audiotomidi that does this.

  20. Alex says:

    I uploaded this to youtube for those with slower connections.

  21. Cynyr says:

    A better direct link, since mplayer didn’t like the .asx one. mms://ondemand.msmedia.zdf.newmedia.nacamar.net/zdf/data/msmedia/3sat/09/10/091002_klavier_kuz_vh.wmv

    Which is really just the contents of the .asx one. Why we need a link to a file that has a link the media in it, I have no idea.

  22. EdZ says:

    @Pony
    That’s EXACTLY what they’re doing. You could probably increase the output quality by varying the strike velocity (the current implementation appears to be bang-bang). This could also be done to create a midi output, and drive any instrument(s) that can produce enough separate and relatively pure tones. Several guitars tuned to be slightly out of phase, for example, could work.
    It works using existing speech, so is not a speech synthesiser. You could feed the output of a speech synthesiser into it though (as well as any other sort of sound file).

  23. razor says:

    Dude! Anybody here StrongBad fans? (www.homestarrunner.com) That sounds just like his Lappy 486 :P hehehe

  24. macegr says:

    It almost seems that this is less about combining frequencies to get a specific waveform, and more about hitting a lot of keys in rapid succession to get low frequency 1-bit audio. It sounds a lot like the speech samples you could get from the old computers that simply had on-off buzzers. By having all these keys in parallel, you can get a lot of plinks per second and overcome the mechanical limitations of a single key. Then you randomize the keypresses around a central frequency to color the overall sound impression with an overtone that appear to follow the sound sample.

    Maybe they’re NOT doing it this way, but I can’t read German. :)

  25. Colin says:

    I’ve been thinking about something like this for a while but in reverse. Using human voice to accurately recreate the sound of other instruments (think a cappella but with a computer automatically creating the sheet music based on a sound recording as the input.)

    If some one knows how they did this more precisely, it might help. The problem with just simply using a Fourier breakdown is the assumption of pure waves, and I haven’t thought of a good method of taking into account the overtones.

    Any ideas?

  26. vic says:

    @Colin: Fourier’s transform is a particular case of Schmidt’s orthogonal projection from the space of periodic function to the subspace of sinusoidal functions … I’m pretty sure it is possible to project to any other subspace. Now my years of study are a bit far away so I’ll let it to you to go on from this point ;)

  27. Pilotgeek says:

    @sly
    Yes… this is very glados-y and quite creepy. If there’s ever some sort of AI that gains consciousness and becomes crazy, it damn well better use one of these to vocalize.

  28. KI4MCW says:

    As an experiment a couple of months back, I tried to use MIDI to imitate an SSTV waveform (similar to the audio of a fax transmission, but slower). I used a Perl script to write individual key events to a MID file, then played the file back through Media Player (or whatever). I could not find any waveforms in the Windows General MIDI palette that had a fast enough attack time to render “notes” that were less than a millisecond in duration. Even if it had worked, the output would have needed to be phase-correct across notes, which I don’t think is even possible with MIDI. The whole thing was ridiculous, but the “song” files sure are funny to listen to.

  29. astera says:

    I made a transcription and tried to translate it as good as possible (yes, some parts *are* weird – even in German):

    Alles klar? Wohl kaum – das lässt sich aber ganz einfach ändern.

    Schon erstaunlich, wie genau plötzlich die Worte der Deklaration für einen Internationalen Gerichtshof gegen Umweltverbrechen verständlich werden. ‘Wien Modern’ war eine von zehn kulturellen Institutionen, die um einen künstlerischen Beitrag für die Veranstaltung im Dogenpalast in Venedig gebeten wurde.
    Diese Botschaft mit musikalischen Mitteln hörbar zu machen ohne auf eine simple Vertonung zurückzugreifen, das war das ehrgeizige Ziel.

    Berno Polzer: Ich glaube, es ist teilweise verständlich, teilweise unverständlich. Und es spielt genau mit der Grenze unserer Konstruktionsleistung. Das heißt, wir hören Klänge, die offensichtlich keine normale Musik sind, aber auch keine Sprache, und manchmal findet sozusagen so eine kleine Überbrückung statt. Ich finde, man hört auch ohne den Text zu kennen einzelne Worte, und das Aha-Erlebnis passiert eigentlich dann, wenn man den Text sieht und dann plötzlich die Sprache da ist.

    Ein weiterer Brückenschlag: Miro Markus, ein neunjähriger Schüler aus Berlin, hat den Text für die Performance aufgenommen: Jugend als Hoffnungsträger der älteren Generation.

    Der österreichische Komponist Peter Ablinger hat das Frequenzspektrum der Kinderstimme auf sein computergesteuertes mechanisches Klavier übertragen.

    Peter Ablinger: Ich löse die eine Phonographie, das bedeutet also eine Aufnahme von irgendetwas – in diesem Falle der Stimme -, in einzelne ‘Pixel’ auf. So könnte man im übertragenen Sinne durchaus sprechen. Und wenn ich die Möglichkeit der Wiedergabe in einer sehr hohen Pixelauflösung habe, und diese habe ich nur mit einem mechanischen Klavier, dann kann ich tatsächlich eine Art von Kontinuität wiederherstellen. Wir können also in einem Klavierklang tatsächlich mit etwas Übung oder Unterstützung oder Untertitelung eine menschliche Stimme hören.

    Got it? Probably not – but we can easily change that.

    Pretty amazing, how all of a sudden the words of the Declaration become understandable to a European Environmental Criminal Court. ‘Wien Modern’ was one out of ten cultural institutions asked for an artistic contribution to the event in Palazzo Ducale in Venice.
    The ambitious goal was to make this message audible with musical means, without falling back to a simple setting.

    Berno Polzer: I think, it’s partially understandable, partially not. And it plays well with the limits of our construction abilities. That is, we hear sounds that obviously aren’t normal Music, but neither they are language, and one could say that sometimes, a bridging happens. Personally, I think you can understand individual words even without knowing the text, and the Eureka moment happens when you see the text, and suddenly, the language is there.

    Yet another bridge: Miro Markus, an elementary school student from Berlin, narrated the text for the performance: Youth as a hope for the older generation.

    The Austrian composer Peter Ablinger transferred the frequency spectrom of the child’s voice to his computer controlled mechanical piano.

    Peter Ablinger: I break down this phonography, meaning a recording of something – the voice, in this case -, in individual ‘pixels’, one can say. And if I have the possibility of a rendering in a fairly high resolution (and that I only get with a mechanical piano), then I in fact restore some kind of continuity. Therefore, with a little practice, or help or subtitling, we actually can hear a human voice in a piano sound.

  30. scholli says:
  31. Patrick Flanagan says:

    I did nearly the exact same thing for my Master’s Recital in spring 2007.

    http://rocketsurgeon.s3.amazonaws.com/PWAP_End.mp4?AWSAccessKeyId=0JC3J24V0Q2JT4S9FR02&Expires=1255212586&Signature=A8bjSj28i4OD8mr8Aoh9Fghghk0%3D

    (1.7 MB)

    I saved myself a lot of time and money by renting a Disklavier, but hat’s off to Peter for building his own player piano.

    One conceptual difference between my work and Peter’s is that my piece begins at a very slow tempo and gradually accelerates to slightly more than normal speed, at which point the text becomes quasi-understandable. It’s a kind of acoustic time-stretching, DSP without the digital signal, that foregrounds the threshold between music and speech.

    Synthesizing phonemes with a noise component is difficult, so I limited the text to words with only vowels sounds and l,m,n,r,w, and y. As it happens, many of the roughly 400 English words that meet that criterion have to do with sex, drugs, or Islam, which made for a politically volatile text, but that was really just a byproduct of the process.

    Once the text was prepared, I recorded myself reciting it and did a Fourier analysis in Max/MSP. I wrote my own partial-tracking software in Max, and used that to extract prominent partials, which were converted to notes and saved in MIDI file. I retouched the MIDI file in Cubase to make the speech more understandable. The final MIDI “score” of the piece resulted from looping the retouched MIDI file while accelerating from a fraction of the original tempo to slightly faster than real-time.

    I’m not staking any claims to originality here; I stole the idea of instrumental speech synthesis from the Indian/English/German composer Clarence Barlow, who I studied with in Cologne in 2002/2003.

  32. draeath says:

    Thank you for the youtube upload. ASX? Really? What were you thinking, there?

  33. dunk says:

    Spar-ky!

  34. dunk says:

    sorry. 3m17s for the talking piano

  35. Alex says:

    @deaeath
    asf was pretty much the only option since I had to rip the video off a media player stream. Trust me, it’s not the format I would’ve preferred.

  36. Tim Anderson says:

    I would like to see the sheet music of the speech.

  37. Stoph Long says:

    Patrick,

    I wasn’t able to view the document you mentioned. I have access to a disklavier and would be fascinated to play the midi file you mentioned if it is available. Also, the auditory research community email list has been discussing the “Talking Piano” project and I’m sure would be interested in hearing about your work too.
    It sounds like a very interesting project.

    StophLong at
    yahoo.co.uk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 92,068 other followers