Morse Code Catches Google Swiping Lyrics

We think of Morse code in terms of dots and dashes, but really it’s a kind of binary code. Those symbols might as well be 0s and 1s or any other pair of characters. That attribute is exactly what led to a sting operation a music lyric site called Genius.com pulled on Google. At issue was a case of song lyrics that had allegedly been stolen by the search giant.

Song lyric sites — just like Google — depend on page views to make revenue. The problem is that in a Google search the lyrics appear on the search page, so there is no longer much incentive to continue to the song lyric site. That’s free enterprise for you, right? It is, but there was a problem. It appears that Google — or, according to Google, one of their partners — was simply copying Genius.com’s lyrics. How does Genius know the song lyrics were copied? According to news reports in the Wall Street Journal and other sources, they used Morse code.

The company first became suspicious when they approached an artist for lyrics that are apparently difficult to understand, and once they had published them they found Google also had the correct lyrics. That’s not proof, of course, but the next step is where they got tricky. They used straight and curly quotes as dots and dashes to embed a Morse code message in several lyrics. The message? REDHANDED.

The quote patterns then reportedly started also appearing in Google search results. Legally though the picture is a little confusing, after all Genius doesn’t own the lyrics in question. It does remain pretty bad form though to take content from other web sites and use it to starve the same web site from traffic.

Google’s statements claim that the lyrics were sourced from a third party called LyricFind and that they would act against any supplier violating their agreements with the company. LyricFind responded to the Wall Street Journal article saying that Genius gets user-generated content which may originate elsewhere and that others may be scraping Genius data into sources that LyricFind then uses.

We aren’t lawyers, so we won’t really comment on the validity of either side’s case. But we did think it was interesting that the sting put Morse code and steganography to practical use.

Photo credit:  Cassi Stewart

44 thoughts on “Morse Code Catches Google Swiping Lyrics

        1. While the case looks like a slam dunk case of Google being caught with their hands in the cookie jar, it may be more complex than that. Google claims that they get the lyrics from a valid resource, LyricFind which is used by the music industry. Sometimes the songwriter does not actually provide the lyrics to the music company, and therefore the music company may of gone to Genius.com to get the lyrics and actually posted them onto LyricFind, where they were legitimately picked up by google. I know it is not as sexy as google conspiracies, but it just goes to show how easy it is to contaminate the data stream out there

    1. Binary in digital has time. Binary in text has character separation. What’s the difference? It’s just a way to tell what’s a distinct data point. Same for the long pause in Morse, that’s the equivalent of a space character.

      1. Maybe so, and I can see why you’d think of it as binary, since Morse is most often sent across media that are inherently binary, such as current loops and on-off carrier waves.

        However, if this were the case, Morse code would be very inefficient, because of the way gaps are used. The basic time unit is the “dit”, and a “dah” is three dits long, and you need a one dit space between each. In addition, the inter-character gap is three dits. This means that the shortest letter, ‘e’, takes four dit times to represent (dit,space,space,space). Which doesn’t seem bad. But to transmit a zero, that’s five dahs plus four inter-unit dits, for 20 dit times, plus the trailing 3 dits, for a total of 23 dit times. But that’s just the physical layer. But to humans, who serve as both the sources and sinks of Morse streams, the code is inherently ternary. A Morse operator does not think of the letter ‘a’ as “1 unit on, 1 unit off, 3 units on, 3 units off”, which would be binary 10111000; he thinks di-dah, which is two ternary digits. With an average of about 13 units per character, which is the length of a character with two dahs and one dit, Morse code is actually less efficient than 7 data, 2 stop, 1 parity ASCII, which takes only 10 units per character.

        So a more literal way of thinking of Morse code as it is commonly transmitted, is as a ternary code, pulse-width modulated.

        1. As far as I can tell though, they didn’t encode space in this case. they didn’t care so much about making it human readable, but rather making that unique pattern that wouldn’t happen by chance. It would be like encoding text into code groups. You destroy the structure of the words to make decoding more difficult. you assume people can figure out the message from the sequence of letters without the spacing. So all all the comments about it not being binary may be true in the arbitrary sense, in this particular case I think it really is a binary pattern being encoded.

          1. Actually, the code itself probably at all important, so the fact that it somehow spelled out “redhanded” was a clever but unnecessary quirk. All they needed to prove was that by encoding SOME randomish sequence in their representation of the lyrics, to prove that Google’s lyrics were copied from theirs. It was really just a form of watermarking. But you just can’t get away with calling a pulse-width code “binary”, just because it’s digital.

    2. With the old sounders. Not only was silence between the individual dots a dashes part of it So was the silence inherent within actual “Morse code dashes, there was one character that used a long dash. As I understand it. to recognize it being sent was the ability to note the begging of the dash as well as the end of the dash.. That was told to me by a long gone old timer who used American Morse code as a rail road telegrapher, and use international Morse code as radio man during WWII.

      1. On Teletypes, which in their basic current loop interface are just extensions of telegraphy. The “Break” key sends a “space” (i.e., open the loop, not send space characters) for as long as you hold the key down. This is to get the attention of the operator at the other end, as the motor will start and the clutch will engage, and the mechanism will cycle continuously without printing anything. I’m guessing this was the direct equivalent to a “long dash”.

        1. Teletypes, which I maintained for many years, were strictly binary. 5 bit (Baudot or Murray) and then 7 bit (ASCII) and 8 bit (Extended ASCII) and even marking and spacing had specific sequences, not just an open or closed circuit. Teletypes were synchronized to each other and a continuous open or closed circuit would cause them to get out of time.

    3. 00011100010000001010001110000001010001010100000010111000111010111010001110001010111000101110001011101010001011101010001110101110111000000111010101000101000111010001011100010111010001110101110111000000000101000111000000101000101010000001110001010001010100011101011101000101110100010001110001000000111000101000111011100010000001011100011101000111010100000010111000111011100010111011101000101110101000101000111000101011100011101010001000

      1. vk4msl: Yes, of course the TRANSMISSION mode is “tiscrete” in time and amplitude, but the underlying code is not. Take for example wigwam signaling, where Morse code is represented by waving a flag to the left for dit, and to the right for dah. It’s still Morse, and it’s only the ternary nature of the transmission mode (center, left, and right) that allows Morse to be transmitted without adding another variable, like pulse width.

      1. There’s a story about the 68000 microprocessor, which was being cloned in Asia. Motorola, irked by the fact that these were direct copies, down to the mask level, added some text to their masks: “Motorola – when you care enough to steal the very best”. Which of course showed up on the next generation of clones.

        1. I was leaving plagiarism audit marks in code back in the 70s.

          I also had a friend who designed circuit boards and would purposely make “mistakes” that had no effect on the operation of the circuit, but were recognizable.

  1. Genius did not write the lyrics. It has a license to publish them, so does Google. The most important thing in this though is the fact that to write down the lyrics of a song you hear should be fair use. This is an age old practice that promotes a songs popularity. This story is so messed up.

    1. binary
      adj 1: of or pertaining to a number system have 2 as its base; “a
      binary digit”
      2: consisting of two (units or components or elements or terms)
      or based on two; “a binary star is a system in which two
      stars revolve around each other”; “a binary compound”;
      “the binary number system has two as its base”
      n : a system of two stars that revolve around each other under
      their mutual gravitation [syn: binary star, {double
      star}]

      So, yeah, Morse code is indeed binary.

      1. But Morse has essentially five. he dot is the basic unit, the dash is three dots long, the time between any two dots or dashes is one dot long, the time between letters is three dots long, and the time between words is 7 dots long.

        Not binary.

          1. What about frequency shift keyed Morse?

            Sorry, it’s short tone, long tone, short no tone, long no tone, and real long no tone, not binary.

  2. Wikipedia makes it quite clear that Morse is nor truly binary, by using binary to show what Morse would look line in binary.

    Morse code is a variable length telegraphy code, which traditionally uses a series of long and short pulses to encode characters. It relies on gaps between the pulses to provide separation between letters and words, as the letter codes do not have the “prefix property”. Morse code can be represented as a binary stream by allowing each bit to represent one unit of time. Thus a “dit” or “dot” is represented as a single 1 bit, while a “dah” or “dash” is represented as three consecutive 1 bits. Spaces between symbols, letters and words are represented as one, three, or seven consecutive 0 bits. For example, “UP” in Morse Code is “..- .–.”, which could be represented in binary as “101011100010111011101”.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.