Morse Code Catches Google Swiping Lyrics

We think of Morse code in terms of dots and dashes, but really it’s a kind of binary code. Those symbols might as well be 0s and 1s or any other pair of characters. That attribute is exactly what led to a sting operation a music lyric site called Genius.com pulled on Google. At issue was a case of song lyrics that had allegedly been stolen by the search giant.

Song lyric sites — just like Google — depend on page views to make revenue. The problem is that in a Google search the lyrics appear on the search page, so there is no longer much incentive to continue to the song lyric site. That’s free enterprise for you, right? It is, but there was a problem. It appears that Google — or, according to Google, one of their partners — was simply copying Genius.com’s lyrics. How does Genius know the song lyrics were copied? According to news reports in the Wall Street Journal and other sources, they used Morse code.

The company first became suspicious when they approached an artist for lyrics that are apparently difficult to understand, and once they had published them they found Google also had the correct lyrics. That’s not proof, of course, but the next step is where they got tricky. They used straight and curly quotes as dots and dashes to embed a Morse code message in several lyrics. The message? REDHANDED.

The quote patterns then reportedly started also appearing in Google search results. Legally though the picture is a little confusing, after all Genius doesn’t own the lyrics in question. It does remain pretty bad form though to take content from other web sites and use it to starve the same web site from traffic.

Google’s statements claim that the lyrics were sourced from a third party called LyricFind and that they would act against any supplier violating their agreements with the company. LyricFind responded to the Wall Street Journal article saying that Genius gets user-generated content which may originate elsewhere and that others may be scraping Genius data into sources that LyricFind then uses.

We aren’t lawyers, so we won’t really comment on the validity of either side’s case. But we did think it was interesting that the sting put Morse code and steganography to practical use.

Photo credit:  Cassi Stewart

60 thoughts on “Morse Code Catches Google Swiping Lyrics

        1. While the case looks like a slam dunk case of Google being caught with their hands in the cookie jar, it may be more complex than that. Google claims that they get the lyrics from a valid resource, LyricFind which is used by the music industry. Sometimes the songwriter does not actually provide the lyrics to the music company, and therefore the music company may of gone to Genius.com to get the lyrics and actually posted them onto LyricFind, where they were legitimately picked up by google. I know it is not as sexy as google conspiracies, but it just goes to show how easy it is to contaminate the data stream out there

    1. Binary in digital has time. Binary in text has character separation. What’s the difference? It’s just a way to tell what’s a distinct data point. Same for the long pause in Morse, that’s the equivalent of a space character.

      1. Maybe so, and I can see why you’d think of it as binary, since Morse is most often sent across media that are inherently binary, such as current loops and on-off carrier waves.

        However, if this were the case, Morse code would be very inefficient, because of the way gaps are used. The basic time unit is the “dit”, and a “dah” is three dits long, and you need a one dit space between each. In addition, the inter-character gap is three dits. This means that the shortest letter, ‘e’, takes four dit times to represent (dit,space,space,space). Which doesn’t seem bad. But to transmit a zero, that’s five dahs plus four inter-unit dits, for 20 dit times, plus the trailing 3 dits, for a total of 23 dit times. But that’s just the physical layer. But to humans, who serve as both the sources and sinks of Morse streams, the code is inherently ternary. A Morse operator does not think of the letter ‘a’ as “1 unit on, 1 unit off, 3 units on, 3 units off”, which would be binary 10111000; he thinks di-dah, which is two ternary digits. With an average of about 13 units per character, which is the length of a character with two dahs and one dit, Morse code is actually less efficient than 7 data, 2 stop, 1 parity ASCII, which takes only 10 units per character.

        So a more literal way of thinking of Morse code as it is commonly transmitted, is as a ternary code, pulse-width modulated.

        1. As far as I can tell though, they didn’t encode space in this case. they didn’t care so much about making it human readable, but rather making that unique pattern that wouldn’t happen by chance. It would be like encoding text into code groups. You destroy the structure of the words to make decoding more difficult. you assume people can figure out the message from the sequence of letters without the spacing. So all all the comments about it not being binary may be true in the arbitrary sense, in this particular case I think it really is a binary pattern being encoded.

          1. Actually, the code itself probably at all important, so the fact that it somehow spelled out “redhanded” was a clever but unnecessary quirk. All they needed to prove was that by encoding SOME randomish sequence in their representation of the lyrics, to prove that Google’s lyrics were copied from theirs. It was really just a form of watermarking. But you just can’t get away with calling a pulse-width code “binary”, just because it’s digital.

    2. With the old sounders. Not only was silence between the individual dots a dashes part of it So was the silence inherent within actual “Morse code dashes, there was one character that used a long dash. As I understand it. to recognize it being sent was the ability to note the begging of the dash as well as the end of the dash.. That was told to me by a long gone old timer who used American Morse code as a rail road telegrapher, and use international Morse code as radio man during WWII.

      1. On Teletypes, which in their basic current loop interface are just extensions of telegraphy. The “Break” key sends a “space” (i.e., open the loop, not send space characters) for as long as you hold the key down. This is to get the attention of the operator at the other end, as the motor will start and the clutch will engage, and the mechanism will cycle continuously without printing anything. I’m guessing this was the direct equivalent to a “long dash”.

        1. Teletypes, which I maintained for many years, were strictly binary. 5 bit (Baudot or Murray) and then 7 bit (ASCII) and 8 bit (Extended ASCII) and even marking and spacing had specific sequences, not just an open or closed circuit. Teletypes were synchronized to each other and a continuous open or closed circuit would cause them to get out of time.

    3. 00011100010000001010001110000001010001010100000010111000111010111010001110001010111000101110001011101010001011101010001110101110111000000111010101000101000111010001011100010111010001110101110111000000000101000111000000101000101010000001110001010001010100011101011101000101110100010001110001000000111000101000111011100010000001011100011101000111010100000010111000111011100010111011101000101110101000101000111000101011100011101010001000

      1. vk4msl: Yes, of course the TRANSMISSION mode is “tiscrete” in time and amplitude, but the underlying code is not. Take for example wigwam signaling, where Morse code is represented by waving a flag to the left for dit, and to the right for dah. It’s still Morse, and it’s only the ternary nature of the transmission mode (center, left, and right) that allows Morse to be transmitted without adding another variable, like pulse width.

      1. There’s a story about the 68000 microprocessor, which was being cloned in Asia. Motorola, irked by the fact that these were direct copies, down to the mask level, added some text to their masks: “Motorola – when you care enough to steal the very best”. Which of course showed up on the next generation of clones.

        1. I was leaving plagiarism audit marks in code back in the 70s.

          I also had a friend who designed circuit boards and would purposely make “mistakes” that had no effect on the operation of the circuit, but were recognizable.

  1. Genius did not write the lyrics. It has a license to publish them, so does Google. The most important thing in this though is the fact that to write down the lyrics of a song you hear should be fair use. This is an age old practice that promotes a songs popularity. This story is so messed up.

    1. binary
      adj 1: of or pertaining to a number system have 2 as its base; “a
      binary digit”
      2: consisting of two (units or components or elements or terms)
      or based on two; “a binary star is a system in which two
      stars revolve around each other”; “a binary compound”;
      “the binary number system has two as its base”
      n : a system of two stars that revolve around each other under
      their mutual gravitation [syn: binary star, {double
      star}]

      So, yeah, Morse code is indeed binary.

      1. But Morse has essentially five. he dot is the basic unit, the dash is three dots long, the time between any two dots or dashes is one dot long, the time between letters is three dots long, and the time between words is 7 dots long.

        Not binary.

          1. What about frequency shift keyed Morse?

            Sorry, it’s short tone, long tone, short no tone, long no tone, and real long no tone, not binary.

          2. Nuclear: Hmm. Did you read that article you linked to? What the author said was, “In Morse code, letters of the alphabet and other characters like numbers CAN BE REPRESENTED as sequences of binary numbers, although each bit in the sequence is sometimes called a “dit” or a “dah”. In writing, a “dit” is represented by a dot (.), and a “dah” is represented by a dash (-).” (Emphasis mine.) Note that the author continues with “although each bit in the sequence is sometimes called a “dit” or a “dah”.” So the “bits” he is referring to have three states: “dit”, “dah”, or space. The code does not work if you don’t have spaces. That’s three states by my count. The author is correct in saying that the letters and other characters can be represented as sequences of binary numbers, but that is true of just about anything. That does NOT make the code itself binary. Just try writing a Morse code generation or recognition program using only binary to describe the codes, and you will find yourself in a mess of trouble, because you have to also account for the spaces between letters and between words.

          3. Exactly

            “In Morse code, letters of the alphabet and other characters like numbers can be represented as sequences of binary numbers”

            Sequences of binary numbers being the key phrase. In other words (and note that they use a lookup table with a limited length field of 5 bits, so it couldn’t even produce prosigns or punctuation.) they are not even trying to portray any one dot or dash as a binary digit, but using multiple binary digits to portray Morse characters.

            Making the article itself definitive proof that Morse is not binary.

          4. NYU has been known to be wrong before, such as when they claimed that serial ports and RS-232 were one and the same, one would hope that they have learned the error of their ways since 2009. It is obvious that they never attended any of my classes at Bellcore TEC. They are wrong about only two signal elements, – ” In the process, students learn about how only two signaling elements – the “dit” and the “dah”, or 0 and 1, can be used to encode arbitrarily complex messages.”

            In their program they only code individual letters, not words, or sentences. Once they started to attempt to add the necessary spacing between “signal elements”, characters, and words, their simple little program would fail miserably.

            I have used binary programming to send Morse code, including provisions for the proper spacing, and can assure you that representing a dit as a 0 and a dah as a 1 does not provide for the necessary spacing, at which point the spacing adds a minimum of two more signal elements.

            Their little lesson is an interesting programming problem, using a basic stamp, but it does not prove that Morse is in any way binary.

          5. But it is not as simple as tone or no tone, it’s short tone, long tone, short no tone, long no tone, and longer no tone, distinctly not binary, which would be equal length tone or no tone.

          6. I think you are confusing the thing itself with its physical implementation. “Morse code” is a sequence of symbols, dit, dah, and space, to represent letters, numerals, and punctuation in an efficient manner. As implemented in wired telegraphy and in CW communications, it is expressed as either pulse width or pulse code modulation, depending on the exact implementation (the definition gets blurred a little because the widths of both states are variable). At this level, the implementaiton can be considered binary. But the information with which a carrier is being modulated is not binary. An alternate physical implementation would be bipolar keying over a wire (+v for dit, -v for dah, 0 for space), which would make the time differences between dit and dah superfluous, and another for radio would be frequency shift keying using three frequencies. And before you counter that neither of these methods has found widespread use, I remember from my Boy Scout days, learning to send Morse code using a single flag, in the “wig-wag” fashion. This involved waving a flag to one’s left for dit, to one’s right for dah, and holding it straight overhead for space. This method of communication was taught to and used by many thousands of people, so this was an actual, practical implementation. Note that with all of these implementation examples, dit and dah take the same amount of time, yet they are all still Morse code. And there were three symbols, making them ternary implementations. The fact that Morse himself used a binary implementation changes none of this – it was just his choice of the physical layer implementation, which was favored by contemporary technology.

          7. Well, then. . . using the same logic, then binary is not binary. 1s and 0s are combined into 8-bit words to send information, so it’s not binary.

          8. But tone and no tone are not signal elements, short tone, long tone, short no tone, long no tone, and longer no tone are the signal elements.

            Not binary by any definition, but you can go on living in your own little world, your inability to learn is amazing.

          9. Thank you for the win, Jim Longley. When one resorts to personal attack, they are out of cogent arguments. You just did that and gave me the win. You write of “signal elements.” Well, in binary computer communications, 1s and 0s are arranged in 8 bit ‘signal elements’ to send information. So in your world, they are not binary. Again, thank you.

          10. uh, you never responded to my argument – the one about the implementation being different from the thing itself (the thing itself being Morse code). You don’t get to claim victory by default. In fact, I guess I do, since you didn’t respond.

          11. binary: consisting of, indicating, or involving two.

            Your own post said, “At this level, the implementaiton can be considered binary.” There is no need to reply.

          12. Yes, you get the win, but you only won a race between your own obstinacy and obvious lack of learning, it was a race with only one contestant.

            And groups of 8 elements do not make one element, making your lacks extremely obvious, because you missed by a bit when you stated that “in binary computer communications, 1s and 0s are arranged in 8 bit ‘signal elements’ to send information.” Which makes the invalid assumption that all binary computer communications are in 8 bit groups, which could not be farther from the truth.

            So, you, indeed, win the contest for the stupidest commentary on record.

  2. Wikipedia makes it quite clear that Morse is nor truly binary, by using binary to show what Morse would look line in binary.

    Morse code is a variable length telegraphy code, which traditionally uses a series of long and short pulses to encode characters. It relies on gaps between the pulses to provide separation between letters and words, as the letter codes do not have the “prefix property”. Morse code can be represented as a binary stream by allowing each bit to represent one unit of time. Thus a “dit” or “dot” is represented as a single 1 bit, while a “dah” or “dash” is represented as three consecutive 1 bits. Spaces between symbols, letters and words are represented as one, three, or seven consecutive 0 bits. For example, “UP” in Morse Code is “..- .–.”, which could be represented in binary as “101011100010111011101”.

  3. BirghtBlueJim, he doesn’t even come up to the level of troll, being, as he named himself, merely nuclear, as in the nucleus of an atom, as in infinitesimally small.

    I am through dealing with his syllogistic arguments.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.