Reverse Engineering Cyclic Redundancy Codes

Cyclic redundancy codes (CRC) are a type of checksum commonly used to detect errors in data transmission. For instance, every Ethernet packet that brought you the web page you’re reading now carried with it a frame check sequence that was calculated using a CRC algorithm. Any corrupted packets that failed the check were discarded, and the missing data was detected and re-sent by higher-level protocols. While Ethernet uses a particularly common CRC, there are many, many different possibilities. When you’re reverse-engineering a protocol that contains a CRC, although it’s not intended as a security mechanism, it can throw a wrench in your plans. Luckily, if you know the right tool, you can figure it out from just a few sample messages.

A case in point was discussed recently on the hackaday.io Hack Chat, where [Thomas Flayols] came for help reverse engineering the protocol for some RFID tags used for race timing. Let’s have a look at the CRC, how it is commonly used, and how you can reverse-engineer a protocol that includes one, using [Thomas’] application as an example.

What’s a CRC, Anyway?

A CRC is a type of code designed to add redundancy to a message in such a way that many transmission errors can be detected. There are a number of different types of codes that can be used in this way, but the CRC has some properties that make it especially useful for communications protocols. First, it’s efficiently implemented in hardware or software. More importantly, it can be particularly good at detecting the kinds of errors often seen in common data channels, specifically, runs of bit errors. The simplest way to use a CRC is to apply the algorithm to the message to be sent, then append the resulting CRC value to the message. The receiver applies the same algorithm, then checks that the transmitted and locally calculated CRC values match. CRCs vary in length, with the most common ones being 8, 16, or 32-bits long.

Mathematically, a CRC is  based on division of polynomials over GF(2), the Galois Field of two elements. There’s a lot of interesting math involved, and a simple web search will turn up plenty of resources if you want to dive further into the subject. But, this is Hackaday, and I’m going to try to give you enough background to be able attack practical situations, so here we go.

In simple terms, the polynomials involved with the CRC algorithms have coefficients of only 0 or 1. There’s a simple mapping between binary strings and such polynomials, in which each set bit becomes a term with the bit position as the exponent. For example, the binary string 11011 can be represented as x4 + x3 + x + 1. (As a mnemonic device, think about x = 2.) To calculate an n-bit CRC, we append n zero bits to our message, then convert to a polynomial. We then divide this message polynomial by a generator polynomial specified as part of the CRC algorithm. The polynomial remainder of this division algorithm is the CRC.

Calculating CRCs

OK, it’s going to get a little more relatable now. The division operation maps to a sequence of XOR operations that will remind you of basic arithmetic. For example, to calculate a trivial 2-bit CRC of the message string 1101 with the generator polynomial 11, we first append 00 to the message to get 110100, then divide to get a quotient of 10011 and a (2-bit) remainder of 01. This remainder is the CRC. Note that at each step of the long division, an XOR operation is used instead of the more familiar subtraction with borrowing.

This is all interesting if you want to write your own CRC implementation, but that’s probably not necessary; you can easily find implementations in your language of choice. If you do want to dig into the algorithms in more detail, here’s a good place to start.

Complications

Besides the generator polynomial, there are four other parameters that describe a general CRC algorithm. First, the input bytes may be reflected — swapped bit order from left to right. There may also be an initial starting value for the CRC calculation; this is prepended to the message before the division. After the division, the n-bit CRC may also be reflected, and finally, it may be XOR’d with a constant before use. While each of these steps is relatively trivial in itself, the resulting number of possible CRC algorithms is huge; too large for practical brute-force search for reverse engineering.

Reverse-engineering a CRC

RFID tag output

Remember [Thomas] and his RFID tags? After examining the RF-output of one of the tags on an oscilloscope, he made a simple receiver using a diode detector, RC filter, and a homebrew antenna. Connecting the receiver to an ESP32, he wrote some code to send back the received pulses to his computer for analysis. What he found was that each tag repeated the same 32-bit message about every 4 ms. Sixteen bits of the message were recognizable as the ID number which was printed on each tag, but the other half of each message was a mystery.

This kind of transmission is exactly why CRCs are used. Imagine if tag 5’s message suffered a bit flip during transmission and was received as tag 13. Adding a CRC allows this error to be detected, and the message discarded — a new one will be along soon enough. But, [Thomas], assuming the remaining sixteen bits of the message were a checksum, was now faced with determining which one. The first approach by the Hack Chat denizens was to try online calculators for common CRC types, but this didn’t yield results. While you might get lucky this way, checking the most common handful of CRC parameters, this method doesn’t scale. Luckily, there’s an efficient alternative.

CRC RevEng

A search for CRC reverse engineering turns up the right tool for the job. CRC RevEng by [Gregory Cook] was written to do exactly that: given a few examples of message/CRC pairs, it can determine the parameters of the CRC algorithm used. [Thomas] had collected the IDs and checksums from a number of tags, four of which are shown here:

5A2C DAFC
5B25 C378
5BBC 8B71
5C0A 3EEC

To execute the search, the reveng program is invoked with the ‘-s’ switch, and in this case, the known size of the CRC,  ‘-w 16’. Given just one sample, the program cannot determine which CRC generates the code:

With two samples, the program determines three sets of parameters which could possibly be used:

Finally, given three examples, the code narrows down the search to a single set of parameters.

From here, it’s relatively easy to verify that this set of parameters also reproduces the remainder of the 40 or so tags that [Thomas] has access to. With the CRC algorithm determined, he can now generate his own tags, or reliably receive data from existing ones.

It’s important to remember that the addition of a CRC code to these messages wasn’t intended to keep an adversary from faking the transmissions. As [Thomas] points out, since the tags are not synchronized in any way, their on-off-keying signals can easily collide and create receive errors. Checking the CRC on received messages provides some insurance against this possibility.

Spoofing CRCs

The CRC RevEng code can also manipulate a message to generate a desired CRC value, which can be useful for spoofing.  Called “reversing”, by the program, the behavior is invoked with the ‘-v’ switch. This usage requires that you can identify an n-bit section of the the message that you can change without negative effect – for instance, a comment field that is ignored. If you can locate such a contiguous space in the message,  you can substitute a value there to make any message have a given CRC code. This can be useful in spoofing packets that you need to change in some way, while maintaining a specific checksum value.

Wrapping Up

If you find yourself faced with some unknown data in a message protocol, consider the possibility that it might be a checksum. Although many RF protocols hide these values from users, if you’re looking at messages at the lowest levels, you may well find them. If you do, remember there’s a tool to figure out what’s going on.

17 thoughts on “Reverse Engineering Cyclic Redundancy Codes

  1. I’m surprised that no one has mentioned that the CRC can be implemented in dedicated hardware, viz. a shift register with feedback loops coming from specific bits. Looking at the polynomial, it’s very easy to see which feedback loops one should choose.

    Nobody today would actually implement a CRC this way, but I found it tremendously useful to understand how the whole thing works.

    There’s a very close relationship between CRC’s and pseudo-random number generators (PRNs). Around 1985 I wrote an article in Embedded Systems Programming about this. I sought to understand how to define the polynomials that work (most don’t). I learned more about Abelian groups and Galois fields than I care to know, but never did figure it out. I had to just take someone else’s word for it.

    (Hint: If anyone can point to a link that explains it all, I’d be eternally grateful.)

    I only did the 8-bit cases, but that was confusing enough for me. It turns out that there are about 8 (or was it 16?) ways to implement the _SAME_ polynomial. Do you shift the register left or right? Do you start with 0 in the extended bits, or some magic number? Do you expect to get 0 as the result, or some other magic number?

    Even after all that, the implementation in the then-popular Modem-7 app threw another curve. Serial data are normally transmitted low-bit first, but Modem-7 did it the other way round (!). Took awhile to figure THAT one out!

    Even today, 35 years later, I get emails urging me to re-do that article. Maybe I will, some day Real Soon Now.

    1. > Nobody today would actually implement a CRC this way

      I’m not sure whether you mean “in hardware” or “with a shift register and XOR feedback”, but definitely wrong on both counts. Once you get above some Gb/s, “in hardware” is the ONLY way to do it, although at those speeds, one would usually unroll the loop and calculate several (i.e. hundreds of) bits per clock, so it’s not really a shift register any more.

  2. Nice article, Ted.

    I am amazed how sharp this article looks in terms of the formatting and the visuals. Can you help me get started with tools and techniques to publish technical articles that don’t look like a poorly put together *ppt?

  3. I am trying to implement XMODEM-CRC but I am getting stumped in implementing the CRC16 since I am not getting how the algorithm works. Can someone publish a “recipe” aka instruction list with only words?

  4. I like the article, but the long division is wrong. It should be:

    ___10001__
    11) 110100
    _11_
    00100
    _11_
    1 Remainder

    In base ten this shows that 3 into 52 equals 17 with a remainder of 1. The long division in the article says that three into 52 is 19 with a remainder of 1.

    1. You are doing the division over the integers in binary form. That’s not how GF(2) works, it implements modular arithmetic. That means that 1 + 1 = 0 in this case, _with no carry_.
      Also, in GF(2) both plus and minus map to the same XOR operation, which simplifies things quite a lot when implementing polynomials in comparison to GF(3), for example.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.