Vocoding With A Piano

This really cool project allows a grand piano to “speak”.  We don’t know any details about its construction but we had to share. The keys are being hit by solenoids in a manner to replicate human speech.  Click through to the video, it’s worth it. You may have to allow the popup to see the video, and it is in german, but the piano is clearly speaking english. We want one to keep around the office. It could read our emails to us.

(Edit from 2015: The link went bad, but it can be found elsewhere on YouTube.)

[via matrixsynth]

43 thoughts on “Vocoding With A Piano

    1. Yep but if the computer was using a speech synthesizer, the output would still be a waveform, just the same as recording it from a kid gives. I’m sure any old speech synth software would do, although apparently the guy with the very German hairdo put some work into tweaking it manually. I can’t think of any other way of making a piano speak than this.

      1. They just quantized a spectrogram at the frequencies of a normally tuned piano, then feed the result to a piano with solenoids on each key. There is a lot more compositions, not only this one… by the way,this one it’s called “Deus cantando”.

  1. @Bryan
    I agree. It sounds creepy as hell.

    If I had the time and money I’d pop the solenoids directly on the strings and go for more of a Kraftwerk “Man-Machine” sound. Maybe modulate the carrier by adjusting the dampening on the string.

  2. if that’s really just replicating an audio file, i wonder how it would sound replicating other instruments, or even a whole orchestra. this seems like something that could be done easily with a software piano synth.

  3. I’m not sure how he’s doing it, but if it really is just converting frequency ranges to keystrokes, it should be dead simple to write a program to convert audio files to MIDI. I might play around with the idea some when I get the time.

  4. @Pony
    That’s EXACTLY what they’re doing. You could probably increase the output quality by varying the strike velocity (the current implementation appears to be bang-bang). This could also be done to create a midi output, and drive any instrument(s) that can produce enough separate and relatively pure tones. Several guitars tuned to be slightly out of phase, for example, could work.
    It works using existing speech, so is not a speech synthesiser. You could feed the output of a speech synthesiser into it though (as well as any other sort of sound file).

  5. It almost seems that this is less about combining frequencies to get a specific waveform, and more about hitting a lot of keys in rapid succession to get low frequency 1-bit audio. It sounds a lot like the speech samples you could get from the old computers that simply had on-off buzzers. By having all these keys in parallel, you can get a lot of plinks per second and overcome the mechanical limitations of a single key. Then you randomize the keypresses around a central frequency to color the overall sound impression with an overtone that appear to follow the sound sample.

    Maybe they’re NOT doing it this way, but I can’t read German. :)

  6. I’ve been thinking about something like this for a while but in reverse. Using human voice to accurately recreate the sound of other instruments (think a cappella but with a computer automatically creating the sheet music based on a sound recording as the input.)

    If some one knows how they did this more precisely, it might help. The problem with just simply using a Fourier breakdown is the assumption of pure waves, and I haven’t thought of a good method of taking into account the overtones.

    Any ideas?

  7. @Colin: Fourier’s transform is a particular case of Schmidt’s orthogonal projection from the space of periodic function to the subspace of sinusoidal functions … I’m pretty sure it is possible to project to any other subspace. Now my years of study are a bit far away so I’ll let it to you to go on from this point ;)

  8. @sly
    Yes… this is very glados-y and quite creepy. If there’s ever some sort of AI that gains consciousness and becomes crazy, it damn well better use one of these to vocalize.

  9. As an experiment a couple of months back, I tried to use MIDI to imitate an SSTV waveform (similar to the audio of a fax transmission, but slower). I used a Perl script to write individual key events to a MID file, then played the file back through Media Player (or whatever). I could not find any waveforms in the Windows General MIDI palette that had a fast enough attack time to render “notes” that were less than a millisecond in duration. Even if it had worked, the output would have needed to be phase-correct across notes, which I don’t think is even possible with MIDI. The whole thing was ridiculous, but the “song” files sure are funny to listen to.

  10. I made a transcription and tried to translate it as good as possible (yes, some parts *are* weird – even in German):

    Alles klar? Wohl kaum – das lässt sich aber ganz einfach ändern.

    Schon erstaunlich, wie genau plötzlich die Worte der Deklaration für einen Internationalen Gerichtshof gegen Umweltverbrechen verständlich werden. ‘Wien Modern’ war eine von zehn kulturellen Institutionen, die um einen künstlerischen Beitrag für die Veranstaltung im Dogenpalast in Venedig gebeten wurde.
    Diese Botschaft mit musikalischen Mitteln hörbar zu machen ohne auf eine simple Vertonung zurückzugreifen, das war das ehrgeizige Ziel.

    Berno Polzer: Ich glaube, es ist teilweise verständlich, teilweise unverständlich. Und es spielt genau mit der Grenze unserer Konstruktionsleistung. Das heißt, wir hören Klänge, die offensichtlich keine normale Musik sind, aber auch keine Sprache, und manchmal findet sozusagen so eine kleine Überbrückung statt. Ich finde, man hört auch ohne den Text zu kennen einzelne Worte, und das Aha-Erlebnis passiert eigentlich dann, wenn man den Text sieht und dann plötzlich die Sprache da ist.

    Ein weiterer Brückenschlag: Miro Markus, ein neunjähriger Schüler aus Berlin, hat den Text für die Performance aufgenommen: Jugend als Hoffnungsträger der älteren Generation.

    Der österreichische Komponist Peter Ablinger hat das Frequenzspektrum der Kinderstimme auf sein computergesteuertes mechanisches Klavier übertragen.

    Peter Ablinger: Ich löse die eine Phonographie, das bedeutet also eine Aufnahme von irgendetwas – in diesem Falle der Stimme -, in einzelne ‘Pixel’ auf. So könnte man im übertragenen Sinne durchaus sprechen. Und wenn ich die Möglichkeit der Wiedergabe in einer sehr hohen Pixelauflösung habe, und diese habe ich nur mit einem mechanischen Klavier, dann kann ich tatsächlich eine Art von Kontinuität wiederherstellen. Wir können also in einem Klavierklang tatsächlich mit etwas Übung oder Unterstützung oder Untertitelung eine menschliche Stimme hören.

    Got it? Probably not – but we can easily change that.

    Pretty amazing, how all of a sudden the words of the Declaration become understandable to a European Environmental Criminal Court. ‘Wien Modern’ was one out of ten cultural institutions asked for an artistic contribution to the event in Palazzo Ducale in Venice.
    The ambitious goal was to make this message audible with musical means, without falling back to a simple setting.

    Berno Polzer: I think, it’s partially understandable, partially not. And it plays well with the limits of our construction abilities. That is, we hear sounds that obviously aren’t normal Music, but neither they are language, and one could say that sometimes, a bridging happens. Personally, I think you can understand individual words even without knowing the text, and the Eureka moment happens when you see the text, and suddenly, the language is there.

    Yet another bridge: Miro Markus, an elementary school student from Berlin, narrated the text for the performance: Youth as a hope for the older generation.

    The Austrian composer Peter Ablinger transferred the frequency spectrom of the child’s voice to his computer controlled mechanical piano.

    Peter Ablinger: I break down this phonography, meaning a recording of something – the voice, in this case -, in individual ‘pixels’, one can say. And if I have the possibility of a rendering in a fairly high resolution (and that I only get with a mechanical piano), then I in fact restore some kind of continuity. Therefore, with a little practice, or help or subtitling, we actually can hear a human voice in a piano sound.

  11. I did nearly the exact same thing for my Master’s Recital in spring 2007.

    http://rocketsurgeon.s3.amazonaws.com/PWAP_End.mp4?AWSAccessKeyId=0JC3J24V0Q2JT4S9FR02&Expires=1255212586&Signature=A8bjSj28i4OD8mr8Aoh9Fghghk0%3D

    (1.7 MB)

    I saved myself a lot of time and money by renting a Disklavier, but hat’s off to Peter for building his own player piano.

    One conceptual difference between my work and Peter’s is that my piece begins at a very slow tempo and gradually accelerates to slightly more than normal speed, at which point the text becomes quasi-understandable. It’s a kind of acoustic time-stretching, DSP without the digital signal, that foregrounds the threshold between music and speech.

    Synthesizing phonemes with a noise component is difficult, so I limited the text to words with only vowels sounds and l,m,n,r,w, and y. As it happens, many of the roughly 400 English words that meet that criterion have to do with sex, drugs, or Islam, which made for a politically volatile text, but that was really just a byproduct of the process.

    Once the text was prepared, I recorded myself reciting it and did a Fourier analysis in Max/MSP. I wrote my own partial-tracking software in Max, and used that to extract prominent partials, which were converted to notes and saved in MIDI file. I retouched the MIDI file in Cubase to make the speech more understandable. The final MIDI “score” of the piece resulted from looping the retouched MIDI file while accelerating from a fraction of the original tempo to slightly faster than real-time.

    I’m not staking any claims to originality here; I stole the idea of instrumental speech synthesis from the Indian/English/German composer Clarence Barlow, who I studied with in Cologne in 2002/2003.

  12. Patrick,

    I wasn’t able to view the document you mentioned. I have access to a disklavier and would be fascinated to play the midi file you mentioned if it is available. Also, the auditory research community email list has been discussing the “Talking Piano” project and I’m sure would be interested in hearing about your work too.
    It sounds like a very interesting project.

    StophLong at
    yahoo.co.uk

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.