Morse Decoder’s Lean And Sexy Search Algorithm

Often the Morse Code centered projects that we feature are to help you practice transmitting messages. This one takes a tack and builds an automatic decoder. We think [Nicola Cimmino’s] project is well worth featuring simply based on his explanation of the Digital Signal Processing used on the signal coming in from the microphone. Well done. But he’s really just getting warmed up.

What makes this really stand out is a brilliant algorithm that allows conversion from Morse to ASCII using a lookup table of only 64 bytes. This provides enough room for A-Z and 0-9 without chance of collision but could be expanded to allow for more characters. Below is a concise description of how the algorithm works but make sure you take the time to read [Nicola’s] project description in its entirety.

The algorithm can be decribed as follows. Have an index inside the lookup string inizialied to zero. Have an initial dash jump size of 64. At every received element (dot or dash) halve the initial dash jump and then increase by 1 the index inside the lookup string if a dot was received and by dash jump size if a dash was received. Repeat until a letter separator is reached, at that point the index inside the lookup string will point to the ASCII corresponding to the decoded morse.

Have you heard of this technique before? If so, tell us about it in the comments below. Before you jump all over this one, realize that Magic Morse uses a different technique.

27 thoughts on “Morse Decoder’s Lean And Sexy Search Algorithm

  1. If I’m reading it properly, this is just a 6-deep binary tree stored in linear format — as commonly used in heaps. Actually it’s a nonstandard version of linear format, but it seems to work just as well (the standard version would give a string of “.ETIANMSURWDKGOHVF.L.PJBXCYZQ..54.3…2…….16…….7…8.90”, also 64 characters)

  2. Sound a bit like a binary search in which you take left side or right side. That would eventually get you to the letter based on the length of the symbol.

    In this case, the two sides are folded together and offset by one. Not sure if there are any computational significant advantages as the search algorithm is of the same order as binary search. May be you would on the number of local temp variables.

    It is different way of storing the search data and that’s about it.

  3. incidentally, searching for “ETIANMSURWDKGOHVF” on Google turns up a substantial number of hits. “EISH54V” gets many less, but one of them is a solution to LiraNuna’s morse code golf problem on SO, so apparently someone thought of that representation before.

  4. Can you not treat each if dot/dash as a binary bit ‘0’ or ‘1’ in serial bit stream and treat “word” as a binary and use that to directly index into the array for look up?
    i.e. serial -> parallel conversion -> look up number -> letter

    1. tekkieneet: you can’t, because if you did, U = “..-” = 001 and T = “-” = 1 would get the same index. So would L = “.-..” = 0100 and D = “-..” = 100, and many other pairs. You need an extra bit to account for the length of the code. You could do it by prepending a 1 bit to all of them, which gets you yet another 64-entry table :)

  5. Saw this (or at least similar) algorithm in a Swedish computer magazine in the 80’s. I thought it was brilliant back then, but it is (as many before me points out) a binary tree in linear form.

  6. At my mind, the performance of any algo isn’t in the searching of the character in any matrix, but in the detection of the valid code despite dots and dashes that haven’t the good shape during the transmitting.
    It is generally the case during manual transmitting.
    But perhaps, I didn’t really understood your algo ?

    1. That’s what I was thinking too… and if you want to correct poorly transmitted/received data, a simpler lookup table may be more flexible. The speed and memory requirements for Morse code are very low anyway.

  7. The letters and digits in Morse are all five or fewer elements, but several common symbols are longer than that. The question mark, period, and comma are all six elements long. There are quite a few prosigns in common use that are composed of double characters with no space in between them — for example, BK, (meaning roughly “over”, or “It’s your turn to send” is formed with an B and K stuck together with no space. That’s seven symbols, -…-.- A sequence of eight dots is the “telegrapher’s backspace”, in other words, it means that there was an error in the previous word or letter. At nine elements, perhaps the longest prosign is also the most commonly known one, SOS. Though it’s written with three letters, it is sent on the air as one long character, with no space between the dots and dashes.

    More of the common prosigns are listed on Wikipedia http://en.wikipedia.org/wiki/Prosigns_for_Morse_code

    So while that 64 byte table may handle over 90% of the Morse characters transmitted on the air, it will miss some important ones.

    73 de AG6QR

  8. That was actually one way, long ago (about 45 years ago) that Morse code was taught. I can recall my dad bringing home sheets of paper with one “tree” (what we’d call a binary tree today) on each page. I no longer have the pages and can’t credit the original author.

  9. I’m doing something similar using a CPLD for a project. The crazy thing is just how compact the code is when a lookup table enters the picture. Without it, you’d end up with a switch statement with at least 100 potential outcomes. With it, I think it’s around 10 lines of code and still could be shorter if I didn’t unroll a loop unnecessarily.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.