Picking A CRC

June 15, 2026

You send a file, but how do you know it arrived intact? In other words, how do you know that it didn’t get cut off, garbled, or changed somehow? Simplistically, you could just add up all the bytes in the file — a checksum — and send that along with the file. You compute the checksum when you know the file is good, and the receiver can compare the checksum to see if they match.

However, a simple addition doesn’t catch certain classes of errors, which is why there are better checksum algorithms that, for example, wrap the carry bit around or otherwise modify files with common errors so they produce different checksums. There are two problems with checksums. First, no matter how much you modify the algorithm, the chances that two files produce the same checksum are pretty high. Especially with common error patterns.

For example, assume a very simple algorithm that simply adds the bytes and discards any carry. If a file contains 0x80, 0x80, those numbers essentially cancel each other out. If you replace them with 0, 0, you’ll get the same checksum. To some degree, using anything other than a second copy of the entire file will have this problem — some corruption goes undetected — but you want to minimize the number of times that happens.

The other problem is that a checksum by itself doesn’t let you correct anything. You know the data is bad, but you don’t know why. If you think about it, the simplest checksum is a parity bit on a byte: odd parity is simply summing all the bits together. If the parity bit doesn’t match, you know the byte is bad, but you don’t know why. Any even number of errors goes undetected, but I am sure one-, three-, five-, or seven-bit errors will get caught.

People invent better error-checking codes by devising schemes that can promise they can detect a certain number of bit flips and, at least in some cases, correct them. One of these is the cyclic redundancy check (CRC). It is easy to think of the CRC as a “strong checksum,” but it actually works differently. What’s more, there isn’t just a single CRC algorithm. You have to select or design a particular algorithm based on your needs. Most people pick a “named” implementation like CCITT or Ethernet and assume it must be the best. It probably isn’t.

A CRC is a checksum in the broad sense: you feed it a message, and it gives you a small value that you append, store, or compare later. But unlike a simple additive checksum, a CRC is based on polynomial division over GF(2), which is a fancy way of saying “divide using XOR instead of carries.” That detail matters. It gives CRCs very strong guarantees against common classes of errors, provided you choose the right polynomial for the job. That’s the key. You must choose the right polynomial.

The Polynomial Machine

A CRC treats your message as a long binary polynomial. For example, the byte stream is interpreted as a sequence of coefficients: each bit is either present or absent. The CRC algorithm divides the message polynomial, after shifting it by the CRC width, by a generator polynomial. The remainder is the CRC.

In normal arithmetic, division involves subtraction and carries. In CRC arithmetic, subtraction is XOR. That is why CRC code often looks like this:


if (crc & topbit)
crc = (crc << 1) ^ poly;
else
crc <<= 1;

That loop is implementing polynomial long division, one bit at a time. The generator polynomial is the magic number. For a 16-bit CRC, the polynomial has degree 16. For a 32-bit CRC, degree 32. You will usually see it written as a hex constant, such as 0x1021 for CRC-16/CCITT or 0x04C11DB7 for the classic Ethernet/ZIP/PNG CRC-32. But the polynomial is not just an arbitrary constant. It determines what error patterns the CRC is guaranteed to detect.

What CRCs Catch

A well-chosen CRC can guarantee detection of all single-bit errors, many multi-bit errors, all burst errors up to a certain length, and a very high percentage of longer random errors. The key metric is Hamming distance, often abbreviated HD. If a CRC has HD=4 for messages up to a certain length, it detects all 1-, 2-, and 3-bit errors in messages of that length.

That last qualifier is important. CRC strength is not just “16-bit CRC good, 32-bit CRC better.” It depends on the maximum message length. A polynomial that is excellent for 32-byte embedded packets may be mediocre for kilobyte-size messages. A polynomial standardized decades ago may be familiar but not optimal.

[Philip Koopman’s] work at Carnegie Mellon is the go-to reference here. [Koopman] and [Chakravarty’s] paper on CRC polynomial selection for embedded networks specifically looked for good CRC polynomials for short embedded messages, and [Koopman’s] “Best CRC Polynomials” tables list polynomials by width and Hamming-distance performance. The important takeaway is that many standard polynomials were chosen for historical reasons, not because they are mathematically best for your packet size.

There are plenty of videos that explain CRC, but if you are going to watch a video, you might as well pick one of the many from [Phil Koopman] himself, like the one below.

Famous Does Not Mean Optimal

Take CRC-16/CCITT, polynomial 0x1021. It is found everywhere: telecom, embedded examples, and bootloaders. It is not a terrible polynomial, but it is not automatically the best 16-bit choice. [Koopman’s] tables include other 16-bit polynomials with better Hamming-distance behavior over useful embedded-message lengths.

Likewise, classic CRC-32 using polynomial 0x04C11DB7 is deeply entrenched because of Ethernet, ZIP, gzip, and PNG. But CRC-32C, the Castagnoli polynomial, is often a better general-purpose choice. It has excellent error detection properties over common message lengths and is also supported by hardware instructions on many CPUs. Intel added CRC32 instructions with SSE4.2, and ARM AArch64 also includes CRC acceleration for CRC-32 and CRC-32C.

Of course, standards matter if you have to meet the standard. But if you are designing a new private protocol between your sensor board and your controller, blindly copying the first CRC-16 example from the Internet is not engineering. Pick a polynomial based on your packet length and your risk model.

The Practical Embedded View

For very small messages, even an 8-bit CRC may be adequate. For moderate packets, a good 16-bit CRC is often enough. For firmware images or large records, 32 bits is more reasonable. The point is not to use the biggest CRC you can tolerate. The point is to choose a CRC width and polynomial that give the desired detection strength for your longest protected message.

Also, remember what a CRC does not do. It is not cryptographic. It does not stop malicious tampering. The point of a CRC is to detect accidental corruption, not protect against sophisticated hacking attempts.

Real-world CRC definitions also include bit reflection, initial value, final XOR value, and sometimes byte order conventions. Two CRCs can use the same polynomial and still produce different answers because those parameters differ. That is a common embedded debugging trap. Someone says “CRC-16,” and both sides implement different CRC-16s. CRC-16/IBM, CRC-16/CCITT-FALSE, CRC-16/XMODEM, CRC-16/KERMIT, and CRC-16/MODBUS are not interchangeable.

If you specify a CRC in a protocol document, include at least the width, the polynomial (which can be represented in different formats, by the way), the initial value, if you use reflection on the input or output, and any value to XOR the output with. It is also a great idea to include the computed checksum for ASCII “123456789” so anyone can compare their algorithm to yours.

If you are working with Linux systems, be sure to look at the cksum program which can use several CRC algorithms or other methods like sha1 and other digest-style methods.

Efficiency

Computing CRCs a bit at a time is compact, but it costs eight loop iterations per byte. In some cases, that’s ok, but for performance, you want a table if you can afford the memory. For a 16-bit CRC, the table is only 512 bytes and can be generated at compile time, if desired.

Many CPUs have CRC peripherals. Use them, but read the fine print to make sure they can handle your desired CRC. It is often a good idea to check a hardware implementation against a known-good software implementation before you send it out into the wild. You can do many CRC tests using an online tool. Of course, there are several out there.

Choosing a CRC Today

For a new embedded protocol, define the maximum length of data you need to check. Then decide how many bits of overhead you can afford. Then head to Koopman’s tables to pick a polynomial with good Hamming-distance performance for that length.

The CRC has been around for a long time. But it isn’t just something you grab off the shelf. You need to plan and understand the ramifications of picking different polynomials.

CRCs aren’t the only game in town. Credit card numbers, for example, use check digits. There are other ways you can identify and, in some cases, zap bit errors, too.

39 thoughts on “Picking A CRC”

Christian says:

June 15, 2026 at 7:40 am

I think today, when in doubt, you can actually use a cryptographic hash function, like sha1, drop bytes until you’re happy with the length.

When zfs came out I thought it was amateurish overkill, still feels amateurish “Why could they not just pick a good crc?”… but there is so much “compute” now and hardware support for sha and aes, why be frugal?

Report comment

Reply
1. Andrzej says:
  
  June 15, 2026 at 10:16 am
  
  There is a lot of middle ground between crypto hashes (slow) and crc’s (weak). A good example is xxhash
  
  Report comment
  
  Reply
2. Steven Clark says:
  
  June 15, 2026 at 10:30 am
  
  Yes. You also don’t want to discover your integrity check is security relevant after you’ve chosen an insecure hash. If you have the resources and no (relatively) large bandwidth requirements there’s little reason not to default to SHA256 if you don’t need FEC.
  
  Once you get past the microcontroller scale the openssl command line tool becomes the best 700KB you will ever allocate.
  
  Report comment
  
  Reply
3. daid says:
  
  June 15, 2026 at 11:23 am
  
  A while back we went though a whole medical software FDA approval process which includes cyber security. We had a few CRCs in our system for data integrity reasons, which raised questions. In the end, it was easier to replace the CRCs with SHA-256, as then you don’t have to explain anything in relation to cybersecurity instead of needing to explain in detail that those specific instances of CRC are not security related.
  
  Report comment
  
  Reply
4. M says:
  
  June 15, 2026 at 5:50 pm
  
  It’s crazy that ZFS is still the only filesystem that checks that files are intact. EXT4 doesn’t. NTFS doesn’t. Everything just assumes that everything the hard drive returns is correct. A basic run of badblocks on drives more than a few years old shows that’s not the case.
  
  Report comment
  
  Reply
  1. nikp123 says:
    
    June 15, 2026 at 6:01 pm
    
    BTRFS does as well
    
    Report comment
    
    Reply
  2. Ostracus says:
    
    June 17, 2026 at 3:19 am
    
    A fact that certainly has save my bacon when things have gone fubar.
    
    Report comment
    
    Reply
paulvdh says:

June 15, 2026 at 7:47 am

I was pleasantly surprised when I discovered that my STM32 uC had hardware support for CRC, but then I quickly discovered that it only supports a very specific CRC, and I was already using another one for my RS485 based network, so that ended quickly.

But if you are free to choose, then using a CRC hat has hardware support for your uC is an obvious advantage.

When I did some research into CRC’s I found a bunch of algorithms. Some are very quick but use a big LUT (or partial LUT’s), other algorithms can be very small in code size, but execute slowly. In the end I found a very heavily optimized algorithm that was both small and quick (but not extreme), and it ran wel on my Atmega processor, so I kept on using that 16 bit CRC. It’s now hidden deeply in some network stack software library I’ve been using for 20 years, and I can’t even remember what the CRC is. (I can look it up though, I still use the library).

Report comment

Reply
1. Daid says:
  
  June 15, 2026 at 11:27 am
  
  Indeed the F4 STM32 hardware only supports the CRC that is used by ethernet, and is limited to 32bit writes. While F7 and bigger chips support configurable polynomial and 8/16/32bit writes.
  
  Report comment
  
  Reply
Truth says:

June 15, 2026 at 8:18 am

BitTorrent comes to mind where the files are chopped into chunks and a hash created for each chunk and an overall hash for all the chunks. So that if one chunk is corrupt it can be requested again. And after all chunks are transfered the overall hash calculated and compared.

Report comment

Reply
rnjacobs says:

June 15, 2026 at 8:19 am

Relatedly, an early common CRC64 had way too few taps and was thus insufficiently good of a reducing function, see https://www0.cs.ucl.ac.uk/staff/D.Jones/crcnote.pdf

Turns out that even if you’re just detecting bit errors, you still want to flip half the bits on average.

Report comment

Reply
Donovan says:

June 15, 2026 at 8:24 am

Even on embedded devices I take my message and do a sha-256 hash on it and then tag those bytes onto the end. Most processors can do SHA-256 using specialized parts of the chip, and are very very fast. Some protocols like the old CANBUS don’t have enough space for this sort of thing, but CAN-FD has room for 64bytes, and most of my messages fit in a tiny struct. This way, there is pretty much a 100% that a given instruction is the correct instruction.
Rarely is this overhead coming at any real cost in performance.

Report comment

Reply
Wallace Owen says:

June 15, 2026 at 8:27 am

STMicro uses two different CRC peripherals: One is very flexible and allows you to set the polynomial. But many have an inflexible 32-bit unit that implements the CRC used by Ethernet. Usually those chips that also have an Ethernet peripheral. Really, If you need a fast 16-bit CRC, allocate the 512 bytes for the table in your flash and it will work everywhere.

Something not mentioned in the article is when to do the calculation. You can feed the bytes as they arrive, inside the ISR, feeding them into the CRC generator, to amortize the cost in cycles as the message arrives. But sometimes you must wait until you have the entire received message to check it’s validity:
There’s a threshold of performance where the bit rate is high enough that you can’t afford to compute the CRC inside the interrupt service routine and so must wait. Contrawise, if you’re using DMA to bring in the serial data, you must wait for the entire message to be received before computing the CRC.

Report comment

Reply
Julian Skidmore says:

June 15, 2026 at 8:42 am

I was hoping for a discussion on byte parallel algorithmic (rather than table-driven) CRCs for their space and execution efficiency.

crc = ((unsigned char)(crc >> 8) | (crc << 8))^ser_data;
crc ^= (unsigned char)(crc & 0xff) >> 4;
crc ^= (crc << 8) << 4;
crc ^= ((crc & 0xff) << 4) << 1;

You can find assembler implementations for many MCUs and CPUs here:

https://oneweekwonder.blogspot.com/2015/10/parallel-crc16-collection.html
https://oneweekwonder.blogspot.com/2015/11/parallel-crc16-collection-2.html
https://oneweekwonder.blogspot.com/2015/11/parallel-crc16-collection-3.html

The one for the Motorola 6800 (i.e. the 8-bit CPU) was a British Standards Institute standard in the 1970s.

Report comment

Reply
1. Christian says:
  
  June 15, 2026 at 9:21 am
  
  That code, you’ve posted, seems to have quite a few redundant shits. Code in the links looks fine, has <<12 and <<5
  
  Report comment
  
  Reply
  1. Christian says:
    
    June 15, 2026 at 10:40 am
    
    Ok, I get the intent now. Hope is that that (n<<8) will become a byte swap instead of a shift.
    
    I ran the example through Compiler Explorer ( https://godbolt.org/ ), since they offer a LLVM-MOS (6502) CLANG. Optimization flag is -Os .
    
    Code optimizer is good, and it catches the shifts. Also sets up a proper calling convention.
    But they are using virtual registers (lot of zeropage __rc0 – __rc15) and that is not really acknowledging the 3 register architecture.
    
    Report comment
    
    Reply
    1. Julian Skidmore says:
      
      June 22, 2026 at 8:38 am
      
      Yes, I think when I first did it with <<12 and <<5 the gcc AVR compiler turned it into a loop; whereas (n<<8) becomes a byte copy. So, <<12 then becomes:
      
      mov regCrcHi,regCrcLo
      ldi regCrcLo,0
      swap regCrcHi
      andi regCrcHi,0xf0
      
      ;4 cycles and 4 words instead of:
      ldi regTemp,12
      lp:
      lsl regCrcLo
      rol regCrcHi
      dec regTemp
      brnz lp
      
      ;61 cycles, 6 words.
      
      Report comment
      
      Reply
  2. abjq says:
    
    June 18, 2026 at 9:18 am
    
    Those redundant shits are crap.
    
    Report comment
    
    Reply
DanielF says:

June 15, 2026 at 8:56 am

There are two problems with checksums. First, no matter how much you modify the algorithm, the chances that two files produce the same checksum are pretty high.

Uhm, no? Not if the files are not identical.

Report comment

Reply
1. CH says:
  
  June 15, 2026 at 9:55 am
  
  The author gives a simplified example right around where you quoted:
  
  There are two problems with checksums. First, no matter how much you modify the algorithm, the chances that two files produce the same checksum are pretty high. Especially with common error patterns.
  
  For example, assume a very simple algorithm that simply adds the bytes and discards any carry. If a file contains 0x80, 0x80, those numbers essentially cancel each other out. If you replace them with 0, 0, you’ll get the same checksum.
  
  2+2=4, but so does 1+3, 0+4 and -2+6.
  
  Report comment
  
  Reply
2. Conor Stewart says:
  
  June 21, 2026 at 8:33 pm
  
  The CRC has a rather limited range of outputs compared to the potential inputs. Every input into the algorithm must return a number of a given length. You can use a CRC-8 on any length of input data and it will always return a number between 0 and 255, so you have 256 different outputs for basically an infinite amount of possible inputs, so multiple inputs having the same output is something that cannot be avoided.
  
  Hash functions are the same, you are reducing lots of data down to a fixed size signature, again multiple inputs can produce the same output, this cannot be avoided unless you want the hash or CRC to be the length of the input data which defeats the purpose. Instead the purpose of hash functions and CRC is for the output to be unique enough that it can catch changes or errors.
  
  In the case of CRC it would be designed so that a certain number of bit errors are guaranteed to produce a different output, so for example your CRC algorithm could guarantee that it can detect up to 2 bit flips, it may detect more but it wont be able to detect them 100% of the time, in that case specific patterns of flips may produce the same output as your original data. That is why CRC algorithms and polynomials are meant to work on specific input lengths and why the CRC polynomials need to be designed in specific ways to catch the likely kinds of errors, that is why most CRC polynomials catch small amounts of bit flips, like 1, 2 or 3 since they are the most likely. Then it gets even more complicated with things like burst errors.
  
  If your transmission or storage method has certain types of errors that are more likely than others or aren’t issues on other transmission methods then you design the CRC to target the kind of errors your method is likely to produce.
  
  CRCs and hashes are not perfect, they cannot be when you are taking an arbitrary length of data and outputting a fixed size signature.
  
  Report comment
  
  Reply
smellsofbikes says:

June 15, 2026 at 9:14 am

LUTs are way faster but have a pretty solidly irritating failure mode if the LUT is just a little corrupted: most of the time it’s fine because it gets a good value, every once in a while it’s wonky, and if what you’re trying to diagnose is an intermittent problem in a wiring harness that you’re detecting by occasional bad CRC values, whew does this make it complicated. Which is to say: in production, use a verified LUT, but when you’re trying to find a problem you may want to calculate it for each use.

Report comment

Reply
1. Paul says:
  
  June 15, 2026 at 9:36 am
  
  Hey, maybe you could verify the LUT is correct by using a checksum.
  
  Report comment
  
  Reply
2. Wallace Owen says:
  
  June 15, 2026 at 12:01 pm
  
  Keep the table in Flash with the code. Then nothing can step on it.
  
  Report comment
  
  Reply
  1. Mouse says:
    
    June 15, 2026 at 10:06 pm
    
    better not expect that in orbit, or if today’s your cosmic ray day. Or if your flash had a bad nap.
    
    Report comment
    
    Reply
3. Julian Skidmore says:
  
  June 22, 2026 at 11:23 pm
  
  Actually I once did have exactly that issue. I had written a byte-parallel CRC for an embedded system (an AVR XMEGA), which meant that it was pretty quick, though not as quick as a LUT, but nearly as short as a bit-by-bit algorithm. And someone else in the company had been assigned to write some code to handle the PC side of what the XMEGA was talking to, for testing.
  
  And of course it didn’t quite work – most blocks of data would fail. What made it worse was the other guy had just pulled a LUT-based CRC algorithm from the net and when he tested the same data using his own CRC, of course that always worked, so there was pressure on me to just copy his LUT-based CRC.
  
  But it turned out his was at fault, which I proved by using a bit-by-bit algorithm, comparing all the possible inputs and outputs (there are 64K). His version contained errors in a few locations. This meant that there are quite a few programmers who had simply copied that LUT version and put it in real code, not realising that it’s wrong.
  
  Report comment
  
  Reply
CH says:

June 15, 2026 at 9:44 am

Thanks, this was very helpful and informative! A succinct “lecture” on CRCs that also shows how they have commonly been misused and best practices by reading up on Koopman.

Can you do an article on error correcting codes next?

Report comment

Reply
Wilk says:

June 15, 2026 at 10:29 am

The subject of forward error correction (i.e., correcting received bits, rather than retransmission) leads to important considerations for the code being chosen. In the small systems a single error correcting, double bit detecting (SECDED) code may be preferred due to the simplicity of the correction algorithm. The “cyclic” part of CRC refers to the property that every cyclic shift of the bits in a code word is still a valid code word. The computed CRC (called the syndrome) corresponds to the bit position of the error in the received code word, although the correspondence between syndrome and bit position is pseudo-random. For short code words an LUT corrector may be practical. Otherwise, a Meggit decoder, which logically computes the syndrome of the shifted code word in a very efficient way may be used. The Meggitt decoder is clocked until the computed syndrome indicates that the error is (logically) in the LSB. By counting the clocks required, the position of the error can be determined and the relevant bit flipped. There is not any actual shifting of the received code word required). A byte parallel version of the decoder is possible and updates 8 bits at a time. Other correction algorithms are possible when considering multiple bit correction.
In all these cases, the choice of polynomial and the code word length, may be engineered to meet an arbitrary reliability requirement.

Report comment

Reply
Nik says:

June 15, 2026 at 12:21 pm

Very well done article by Al Williams, really appreciate it. Just wanted to outline that in this statement:

“People invent better error-checking codes by devising schemes that can promise they can detect a certain number of bit flips and, at least in some cases, correct them. One of these is the cyclic redundancy check (CRC).”

…the part with “at least in some case, correct them” is basically not true. CRCs were never designed for error correction. They can be used for detecting a data corruption and to trigger a retransmit logic in an application, but this is not the same at all.

Report comment

Reply
Greg A says:

June 15, 2026 at 1:47 pm

there’s a few problem domains where i want to survive an adversary or uniquely index something. and for that, there’s a whole wild world of cryptographic hashing. but for everything else, fundamentally i’m just dealing with line noise and any old crc will do. if my line has enough noise that a weak crc lets bad packets through, then i’ve got bigger problems…really i just want the crc to tell me to throw away the line if it’s that noisy.

but the thing is, there’s a zillion crcs anyways. and even within the world of common simple crcs, there’s incompatible takes…someone will do the same crc but they’ll use a different initial constant. or people will disagree about the byte order. a surprising array of minor confusions and petty differences. and i don’t care! i just need it to match well enough that the microcontroller’s boot loader will accept my upload.

And that runs into my disability…i don’t want to use some huge bloated tool / library / SDK that’s inevitably half trial-and-error slop and half bygone relic. So i have to lift out the one CRC that i need out of some convoluted mess. And even though in the end it’s at most 30 lines of straightforward C code…it’s a needle in a haystack. Always a pain in the butt. Every time. i hate CRCs.

Report comment

Reply
1. Nik says:
  
  June 15, 2026 at 5:07 pm
  
  Well, I’m afraid you’re right. Selecting an effective CRC that matches the project requirements is far from trivial. Koopman’s paper has really great info on the topic. There are several other nice papers for reading:
  – The Effectiveness of Checksums for Embedded Networks – Theresa C.Maxino, 2006, CMU
  – Efficient High Hamming Distance CRCs for Embedded Networks – Justin Ray, Philip Koopman, 2006, CMU
  – Selection of Cyclic Redundancy Code and Checksum Algorithms to Ensure Critical Data Integrity – US FAA, 2015
  
  Report comment
  
  Reply
ian 42 says:

June 15, 2026 at 4:16 pm

Hmm, all this and no comment on CRC64 (also supported by modern hardware) and the significant difference about what you should use is dependant on what you are doing.

ie if you want to crypto verify a file hasn’t been changed, don’t use a crc.

However, if you want a very fast check to see if a file has randomly changed, a CRC will often be the way to go..

And you can do a CRC64 on modern hardware faster than you can do IO…

ie I recently was cleaning up quite a few files – about 100 million – and was looking for duplicate directories. I made crc64 for every file (requiring every file to be read once) then cascaded them up for a CRC64 for each directory recursively, then looked for the highest directories that were duplicate…

This was much faster than using a md5 hash or something, as CRC64 can be done way faster than the 10Gbytes/sec (or so) reading any of my ssds..

I’ve done this before, and I’ve never found two files that are the same size with the same crc64 that weren’t identical.. Yes, you might be able to make them, but it has never happened to me in the wild..

ps if you want to do crc64 very fast try this library – https://github.com/awesomized/crc-fast-rust – it has no trouble doing crc64 at over 60GBytes/sec on my machine, and the combine etc functions work..

Report comment

Reply
1. Nik says:
  
  June 15, 2026 at 5:22 pm
  
  Two comments:
  1. Koopman’s paper was focused basically on brute-forcing the CRCs, which at the time was a huge effort and it’s understandable why they hadn’t approached CRC64 for practical limitations. I guess nowadays we are free to contribute to this area, with so much available computing resources, right?
  
  For filesystem data integrity checks it would be much more reliable to use a cryptographic-grade hash functions instead of CRC64. This becomes especially important for large files (where file’s bit-size exceeds Hamming weights even for HD=1, in the case of using CRC). ZFS seems to use fletcher4 by default for this purpose, but it also offers sha256 and sha512 for crazier deployments.
  
  Report comment
  
  Reply
Nik says:

June 15, 2026 at 5:22 pm

Two comments:
1. Koopman’s paper was focused basically on brute-forcing the CRCs, which at the time was a huge effort and it’s understandable why they hadn’t approached CRC64 for practical limitations. I guess nowadays we are free to contribute to this area, with so much available computing resources, right?

For filesystem data integrity checks it would be much more reliable to use a cryptographic-grade hash functions instead of CRC64. This becomes especially important for large files (where file’s bit-size exceeds Hamming weights even for HD=1, in the case of using CRC). ZFS seems to use fletcher4 by default for this purpose, but it also offers sha256 and sha512 for crazier deployments.

Report comment

Reply
1. ian 42 says:
  
  June 15, 2026 at 8:37 pm
  
  yes, but you missed my point. If you want something to long term store and compare the file too, yes probably use something better than crc64.
  However if you want to make a fast checksum to fingerprint a file for comparison etc etc crc64 is very fast, easy to store and use, and good enough. It adds no time to the read (fastest source I have is about 12Gbgytes/s, which it handles without issue in a single thread) so is light on cpu usage compared to many others…
  
  Report comment
  
  Reply
  1. Nik says:
    
    June 16, 2026 at 2:35 am
    
    Also, I can’t agree that some CRC “seems to be good enough” without an analysis which considers the length of input data and Hamming weights for error detection. We don’t calculate CRC just for the fun of it (usually) but for a specific purpose, and this behavior needs to be precisely verified.
    
    This was the point of one other poster – choosing the right CRC is much more involved that it looks initially. In one of my old projects I had to implement a CRC for protecting FLASH records against data corruption. The easy part was to choose the CRC standard and implement it according to the specific HW constraints, but the more involved part was to actually prove the ability to detect data corruption. I don’t remember precisely now, but it took several hours of one of our big machines to crunch through the 80+ billion test cases, just to prove these 100 lines of C code are working properly. And that, my friend, was just for a studio audio product, not even for automotive, medical or industrial.
    
    Report comment
    
    Reply
  2. Conor Stewart says:
    
    June 21, 2026 at 8:42 pm
    
    Except there is still the possibility of two different files having the same CRC output. So how did you handle that case? That to me is something you can’t ignore when deleting duplicate data.
    
    Report comment
    
    Reply
rasz_pl says:

June 16, 2026 at 11:05 am

Auto moderation eating my links.

I cant recommend CRC RevEng https://reveng.sourceforge.io/ arbitrary-precision CRC calculator and algorithm finder by Greg Cook highly enough. Great at reverse enigneering unknown polynomials.

Report comment

Reply
Chris says:

June 29, 2026 at 3:18 am

I recently wrote CRC table generator in a C++ template. It generates a constexpr crc table at compile time. No more hardcoded magic numbers in code or filling the table at runtime. I learned a lot about CRCs.
If you are interested in the code let me know.

Report comment

Reply

Hackaday

Picking A CRC

The Polynomial Machine

What CRCs Catch

Famous Does Not Mean Optimal

The Practical Embedded View

Efficiency

Choosing a CRC Today

39 thoughts on “Picking A CRC”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Encryption In The 1790s

The Need For Speed: Internet Speed Measurement (or DIY?)

Postal IRCs Are Almost A Thing Of The Past

Launching Rockets Is Hard, Bring Them Back Is Harder

Putting Some Zig In A Linux-Based 3D Printer

Our Columns

Add Sensors To Everything!

Hackaday Podcast Episode 379: Driving E-ink DIY, NES On ESP, And The Other IRC

This Week In Security: AI Is A Mess, Hacking Car Chargers, An OpenSSL DoS, And Factories Under Attack

Hackaday Europe 2026: Half Quad, Half Blimp: Test. Fly. Survive.

FLOSS Weekly Episode 876: There Is No Money Fairy

The Polynomial Machine

What CRCs Catch

Famous Does Not Mean Optimal

The Practical Embedded View

Efficiency

Choosing a CRC Today

39 thoughts on “Picking A CRC”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns