Tech In Plain Sight: Check Digits And Human Error

Bar code shown in a 3D plain in Vaporwave Aesthetic

Computers in working order and with correct software don’t make mistakes. People, however, make plenty of mistakes (including writing bad software or breaking computers). In quality circles, there’s a Japanese term, poka yoke, which roughly means ‘error avoidance’. The idea is to avoid errors by making them too obvious for them to occur. For example, consider a SIM card in your phone. The little diagonal corner means it only goes in one way. If you put it in the wrong way, it is obviously wrong.

To be successful at poka yoke, you have to be able to imagine what a user might do wrong and then come up with some way to make it obvious that it is wrong. There are examples of this all around us and we sometimes don’t even know it. For example, what do your credit card number, your car’s VIN code, and a UPC code on a can of beans have in common?

The answer is that they all are long strings of digits which are notoriously difficult for humans to enter correctly. People miss numbers or transpose them. So people who write applications that take numbers like this often want to check to make sure the person didn’t make a mistake.

Of course, a number is a number, right? If I tell you to enter a five digit zip code, I can figure out if you put in four digits or six, but it is hard to know if 77508 or 77580 is the one you meant. That’s why long and important numbers have one or more check digits.

A check digit is like a checksum or a CRC — you compute it from the other digits in the number and if your computation doesn’t match the check digit you got, then something was put in wrong.

A Simple Example

Just to keep things simple, suppose you have a a four digit PIN number 0000 to 9999 and we want to make a five digit code with a check digit. A simple way to do it would be to add all the digits together and throw away all but the last digit (that is, the remainder after dividing by 10 or modulo 10).

For example 0052 becomes 00527 and 9522 becomes 95228. Simple, right? Now you know that 10118 is not a valid number. Of course, 00527 is valid but so is 00257 or 52007. So maybe we can do better.

Real Life

In real life, algorithms try to take the position of the digits into account. There are a few ways to do this and, as you might expect, there’s a lot of math to decide what’s best. Many systems use a weighted algorithm where each digit has a different weight, usually a 1, 3, 7, or 9 with no two adjacent digits having the same weight. Since those numbers are coprime with 10, any single digit change causes a different check digit. Using that weighting also catches about 90% of single transpositions, other than those involving a 5 and a 2 (since 5 and 2 multiply to 10).

For example, the ubiquitous UPC code uses weights of 1 and 3 for alternate numbers with digit 1 being the rightmost digit (other than the check digit) followed by digits 2, 3, and so on moving to the left. The algorithm is:

  1. Ignoring the check digit, start at the right and add all digits in odd-numbered positions together
  2. Multiply the sum by 3 (the weight for odd digits)
  3. Ignoring the check digit, add the remaining digits to the running total
  4. Take the last digit of the sum (that is, the remainder after dividing by 10); if that digit is not zero subtract it from 10

For example, I have a can of spray air on my desk with a UPC of 681131309516. The first six digits are unique to the company. The next five digits are a unique ID and the final is the check digit. That means the odd-position digits are 1, 9, 3, 3, 1, and 6. The even-position digits are 5, 0, 1, 1, and 8. The first sum, then is 23 and multiplying by 3 gives 69. The even digits results in 15 for a grand total of 84 and a preliminary check digit of 4. Since this isn’t a zero, the real check digit is 10-4 or 6. Try changing any number or swapping any two numbers between the groups and see what result you get.

Even Better

ISBN-10 is even more robust. It uses a ten digit number where each digit has a different weight from 1 to 10 and takes the remainder after dividing by 11. This catches all common errors, but can result in a check digit of 10 which is represented by an X.

There are other even more robust algorithms such as Damm, Verhoeff, and Luhn. You can also add more check digits to get better performance, just as more bits of a CRC are usually more robust.

Meaning

These check digits aren’t made to act as a security device. Usually, the algorithm is well-known and easy to figure out. So it isn’t that bad guys can’t figure out how to make fake credit card numbers because of the check digits. They simply provide a little poka yoke so a program can immediately spot common errors in numbers like these. Something to remember next time you design an interface or anything else prone to human errors.

Of course, if you want to protect against computer errors, you are better off with a CRC. There are also other ways to catch errors if you aren’t worried about humans calculating the check digits.

31 thoughts on “Tech In Plain Sight: Check Digits And Human Error

  1. Nice concise explanation. One addition:

    Redundancy can be used to detect errors but also to correct them, or any combination thereof. It the choice of the decoder.

    An interesting case is the IBAN, which encodes country, bank and account number (mostly used in Europe). The country is first, for example “NL” (The Netherlands) or “GB” (United Kingdom), followed by a well-designed 2-digit checksum, followed by a country-specific composite of fields identifying the bank and branch (sort code) and the account and possibly bank-specific redundancies. The 2-digit checksum provides sufficient redundancy to correct all transpositions of adjacent digits or a single smudged digit, with still enough redundancy left over to be reasonably confident in the result. The checksum is just the remainder mod 97 of the IBAN, including the country, after converting letters to numbers; have a look at the Wikipedia page.

    I use IBAN country + checksum, e.g. “NL42”, as nickname for accounts, mine and others.

    1. I guess few people know that the bank account numbers previously used in Germany also contain checksums. So as the old bank account numbers are embedded into the IBAN, there are actually two checksums that can be checked.

      The Bundesbank has a regularly updated list of all German banks on its website that also lists their checksum method. The checksum methods (currently 144!) are described in a separate PDF. There is an open source project KtoBlzCheck that knows how to verify these checksums.

    2. Years ago we had someone working in the pharmacy. Not an assistant, but a logistical function: unpacking and shelving orders. However, he was not very accurate (which he denied), so we let him go in his first month, still in his probation period. Next he complained he had not received his wages and stated he had filled in his bank account number correctly. While we were looking into this, we were contacted by an honest person asking why we had sent him money. Our employee had managed to make multiple errors while filling in his bank account number in a way that made it pass the bank’s algorithm. Now he could no longer claim unjust termination for being inaccurate!

      N.B. These days the banks check if the name of the account holder matches and will alert you to it if it doesn’ t.

  2. Before so many labels were inkjet printed, these labels were printed with physical numbering wheels like a non-digital car odometer. My father invented a system of calculating these check digits on the fly, and spinning an extra numbering wheel in real time so that the press could just churn out sequential labels with the proper check digits. This was beyond labels – hospital forms used check digits also to help prevent billing errors. This add-on to a commercial printing press or collator also worked through those carbon forms, so that the canary and pink copies would have the same check digits as the white copy.

    It seems like a lifetime ago when this was a real problem. It was also my first paid gig as a software designer and my intro to hardware design.

  3. Microsoft was practicing some poka yoke lately when they put public default permissions on cloud data, who needs to configure that stuff? And look at the results! Sometimes it’s good to sweat the details.

    1. I learned “casting out nines” as a check method for addition and multiplication in 1960. It’s based on the fact that 9 = 10 – 1, and that if the sum of the digits of a base-10 integer is divisible by 9 then the integer is divisible by 9.

      A related technique can be used to check for divisibility by 11. If an integer is divisible by 11 then the sum of the even digits and the sum of the odd digits are congruent modulo 11.

  4. For UPCs, I’ve never understood the final step “if that digit is not zero subtract it from 10”. This only adds complexity to the calculation and doesn’t offer any benefit to the algorithm.

    1. It wouldn’t be surprising that this has to do more with how data was stored. For instance, I’ve worked at a few retailers were the check digit is left off in the database and is computed on the fly instead. If you search the database for the full UPC with check you’d often come up empty handed. It took a while for me to break the habit of ignoring that last digit despite what my brain was telling me. They’ve moved on to longer numbers called GTINs that make the check harder for older systems since they can’t handle the longer characters. There are a lot of big and small box retailers still running on AS400 mainframes and the like.

    2. “Subtract from 10” is also found in other checksums, such as the Luhn algorithm. I suspect it’s because you’re really taking taking the number modulo 10 and using subtraction is an optimization. Using a value of 6 as an example, 10 % 6 == 10 – 6 == 4.

    3. This is a standard trick for linear parity symbols (checksum digits): the correct sequences are exactly those for which the checksum over the entire sequence, including the received parity symbols, is zero. For encoding, one can put zero in the position of the parity symbol, calculate the checksum and puts the negated value into the position of the checksum.

      Computationally it is pretty much the same, but you can check without knowing which symbols are parity.

  5. Somehow programmers acknowledge that humans are bad at typing in lots of stuff that’s hard to memorize, but they insist on programming in languages like c where you have to code every line perfectly to avoid security issues.

    1. It is funny, the tag for these has it right (as do the previous ones of the series). I originally didn’t put that in the title and thought about it late. Finger memory… sigh. Maybe it was a pun… yeah… that’s it lol. If only words had check letters ;)

  6. Interesting that the term “poka yoke” is Japanese because in WWII the Japanese JN-25 code used a check sum to ensure that a message had been sent correctly, I believe that the final coded number could be divided evenly by 3, if it didn’t the message was garbled.

      1. Hah! Welcome to the weird twisted reality created by WordPress Jetpack! The only solace is knowing that you did nothing wrong; the bad outcome was caused by the Evil WP-Jetpack itself :-(

Leave a Reply to Yeshua WatsonCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.