Beep. You hear it every time you buy a product in a retail store. The checkout person slides your purchase over a scanner embedded in their checkout stand, or shoots it with a handheld scanner. The familiar series of bars and spaces on the label is digitized, decoded to digits, and then used as a query to a database of every product that particular store sells. It happens so often that we take it for granted. Modern barcodes have been around for 41 years now. The first product purchased with a barcode was a 10 pack of Juicy Fruit gum, scanned on June 26, 1974 at Marsh supermarket in Troy, Ohio. The code scanned that day was UPC-A, the same barcode used today on just about every retail product you can buy.
The history of the barcode is not as cut and dry as one would think. More than one group has been credited with inventing the technology. How does one encode data on a machine, store it on a physical media, then read it at some later date? Punch cards and paper tape have been doing that for centuries. The problem was storing that data without cutting holes in the carrier. The overall issue was common enough that efforts were launched in several different industries.
In the 1930’s, John Kermode, Douglas Young, and Harry Sparkes created a four bar barcode. They were Westinghouse engineers, and not surprisingly the application was to automate the payment processing of electric power bills. The patents however, were generalized as “Card sorters”.
In 1948, Bernard Silver and Joseph Woodland began work on a system for reading linear and circular printed codes for supermarkets. They took their inspiration from optical audio tracks used in 16mm and 35mm film. In fact, their reader employed an RC935 photomultiplier tube normally used in movie projectors. Silver and Woodland are often credited as inventors of the barcode, but they reference the Westinghouse patent in their own work. Several companies including IBM took interest in the patent, but determined that key technologies still needed to be developed before it would be a practical system. Philco bought the patent, eventually selling it to RCA.
Perhaps the most infamous claim to the barcode throne came from Jerome H. Lemelson. Lemelson was granted over 600 patents in his lifetime, including some for machine vision. Many of these were considered submarine patents. He made most of his fortune by enforcing and licensing those patents to the tune of 1.3 billion dollars. This branded him as an early patent troll. Lemelson’s barcode patents were declared unenforceable in a landmark 2004 court case against Cognex Corporation and Symbol Technologies. This case is often referenced in patent troll litigation today.
 What we know as the modern barcode got its start in the late 1960’s. Local markets were evolving into supermarkets. Checkout systems with mechanical cash registers were the obvious bottleneck. But how to speed things up? Grocery trade associations created the Uniform Grocery Product Code Council (now GS1) to tackle the problem. GS1 solicited solutions and received proposals from RCA, IBM, Singer, Dymo, Litton, and Pitney Bowes, among others. RCA drew on the Silver and Woodland patent to create bulls-eye code. IBM may not have had the patent, but they had something better. Joseph Woodland had been an IBM employee for several years at that point. he was recruited to a team which included George Laurer. Laurer is still active in the industry, maintaining a webpage with information about barcodes. The team worked hard to design a robust code. In the end it was the IBM code that became the Universal Product Code (UPC) we all have come to know.
What we know as the modern barcode got its start in the late 1960’s. Local markets were evolving into supermarkets. Checkout systems with mechanical cash registers were the obvious bottleneck. But how to speed things up? Grocery trade associations created the Uniform Grocery Product Code Council (now GS1) to tackle the problem. GS1 solicited solutions and received proposals from RCA, IBM, Singer, Dymo, Litton, and Pitney Bowes, among others. RCA drew on the Silver and Woodland patent to create bulls-eye code. IBM may not have had the patent, but they had something better. Joseph Woodland had been an IBM employee for several years at that point. he was recruited to a team which included George Laurer. Laurer is still active in the industry, maintaining a webpage with information about barcodes. The team worked hard to design a robust code. In the end it was the IBM code that became the Universal Product Code (UPC) we all have come to know.
 The UPC symbology has remained relatively unchanged since 1974. There have been some extensions to encode extra data, but the core has endured as a long-lasting standard. Once the code was in use, a revision would require massive changes from the printing industry all the way through the point of sale industry.
The UPC symbology has remained relatively unchanged since 1974. There have been some extensions to encode extra data, but the core has endured as a long-lasting standard. Once the code was in use, a revision would require massive changes from the printing industry all the way through the point of sale industry.
Building a Barcode
UPC-A is a numeric only symbology. It’s also a fixed width. Each UPC-A symbol encodes twelve digits, however one digit is used as a check character, leaving only eleven usable digits. The framework of the code starts with a quiet zone, which is literally a quiet area with the same color as the spaces. Just inside the quiet zone a guard bar, which is a unique pattern that defines the start (or end) of the code. UPC-A has quiet zones and guard bars at the start and end of the code. A unique center guard bar defines the middle of the symbol. The rest of the code is made up of twelve characters.

To envision how a UPC-A encodes data, think of morse code. If one drew all the dots and dashes of a morse code message, they would have a rudimentary barcode. In practice, the Morse character set doesn’t work very well because it uses variable length characters. A ‘T’ is one dash, while a ‘Y’ is three dashes and a dot. Determining where one characters ends and another begins would require spaces to be added between every character. That works in Radio communications, but becomes inefficient on the printed page.
Characters – It’s all in the widths
Individual UPC characters are also fixed width. The basic unit of length is called a module, which represents the smallest bar or space used in the symbology. The nominal module size used by UPC-A is 0.33 mm. Each UPC character is made two bars and two spaces, with a total length of 7 modules. For the digit 0 on the left side of a UPC-A, the character is 3,2,1,1 – meaning a space 3 modules wide, followed by a bar 2 modules wide, then another space and bar two modules wide each.
Characters on the right of the center guard bar are color inverted from those on the left. That means every character on the left side starts with a space, while every character on the right starts with a bar.
Why all the complication with two inverted character sets? Direction! The grocery checkout barcode scanner hasn’t changed much over the years. It’s mounted in a slot and items are passed over it. The barcode can be in any orientation, so the scanner has to be able to decode the symbol left to right, right to left, or at nearly any angle.
The important thing to remember is that reading a barcode is that it’s all about relative widths. With a handheld scanner, the barcode can be at any reasonable distance from the scanner. A more distant barcode will appear smaller than a close one. A reader simply has to compare the smallest element it sees (the module) to the width of the other elements. Once these relative widths conform to the rules of the quiet zone and guard bars, the reader decides it has found a possible code and begins to look for characters.
UPC-A may have been the first commonly used barcode, but it didn’t stand alone for long. Europe modified the spec, adding a digit. The resulting code was called EAN-13. EAN codes all include a three digit country code. An odd side effect of this was the creation of the fictional country “Bookland”, which is used for books and other publications.
Today, there are dozens of different barcode symbologies out there. Commonly used linear symbologies include Code 39, Code 128, GS1 DataBar, Interleaved 2 OF 5, MSI Plessy. When one line isn’t enough, 2D symbologies are used which include PDF-417, Aztec code, Maxicode, Datamatrix, and QR code. We take for granted how easy it is to scan a code and jump to a webpage – but it all started with the simple UPC.
Barcode images from Wikipedia.
 
            
 
 
    									 
    									 
    									 
    									 
			 
			 
			 
			 
			 
			 
			 
			 
			 
			
I remember getting a barcode scanner. Everyone I knew wanted to scan everything. They all asked, “What does info does the barcode have?” I tell them that it is the number underneath. They walk away disappointed.
It seems that most people think that barcodes contain actual product info as opposed to just the UPC.
That should be obvious to people when their item doesn’t scan at the checkout and the 14 year old behind the counter simply types in the number at the bottom and moves on.
Just sitting and remembering the :CueCat heyday (or lack thereof)
I actually have 3 or 4 of them still around after using them to catalog my book and DVD collections. The aftermarket software was SUBSTANTIALLY better than the standard :CueCat system.. Probably have a library snapshot on a 3.5″ around here somewhere!
Thanks for the post, I agree that one on the Reader tech through the years would be cool..
I have a USB Cue: Cat that I ‘declawed’ with a simple snip of a pin on the encryption chip. Now it acts just like a keyboard, which is what any UPC readers do that ‘wedge’ between a PS/2 or AT keyboard and its port.
It works the same as the code on magstripe cards. The data isn’t on the card, just a number that points to a database record.
Actually, I believe the data is stored on the cards. There were some articles on here awhile back about hacking the Square reader to dump the data from cards. This didn’t involve any kind of internet connection whatsoever, implying that the data must be on the card.
That depends what you mean by “the data”. The only data on a magstripe payment card is the same as what’s on the front: card number, expiry date, name, plus checksum information.
Would be nice to have information on how the scanners actually read the code, especially with the technology available at the time.
These days? Often a CCD with a light source and a simple image recognition routine. But in the old days? Think of what many of the laser based scanners do. They “scan” with a laser. Lasers are dots, but what you see is a line right?
Much like when you go to a laser show the single dot gets converted into funny lines and shapes by moving mirrrors very quickly. A laser barcode scanner just has a photodiode, a laser and a rotating prisim. All the scanner needs to know to decode the barcode is when the photodiode’s output first drops (laser has hit a black line), and then it needs to keep time and record the photodiode output until the last black line followed by the quiet zone. That gives you a way of reading the data using only 3 analogue components.
If there is enough interest, I can do a second piece that goes through exactly how laser and CCD based scanners work. While CCDs are more common today, Lasers still can’t be beat in terms of resolution and depth of field.
+1
A followup discussing the still-in-use MICR system would be welcome too.
DOO EEEET NAU!
If you could find some barcode related hacks…
Make a video with bit-banging CCD scanners, like what you did with shift registers and i2c EEPROM.
Yes, please.
I once spent several weeks worth of work time trying to single-handedly integrate a bar code scanner into our company’s production system. I had finally started to make some progress when the GM told me to drop the project. What fun.
The original scanners simply deflected the laser along a single line, which is why the bullseye code was tried first. The pack of Wrigley’s gum was the first product scanned and sold with a UPC code, not the first product ever sold with a barcode. That honor goes to (IIRC) boxes of butter with bullseye codes at Kroger’s.
Cheap/fast printing tended to smear the bullseye codes, and they also proved problematic to read on non-flat surfaces.
UPC codes had to be held within a certain angle tolerance for a single laser line scanner to read them, slowing down operation.
The solution to angle tolerance the bullseye code attempted to solve, without making the scanners cost more via adding more lasers, was to mount many mirrors at various angles onto a spinning drum. Combined with some fixed mirrors the system sequentially diverted the laser to scan lines at several different angles, making it much easier for clerks to grab and pass items and get a scan.
Even with that system it’s still possible to pass an item across the scanner at an angle it cannot read. Many times I’ve seen clerks whip an item across several times without it scanning. Then they’ll change their grip so it’s at a different angle and it will usually scan. Training should include “If an item doesn’t scan, rotate it 45 degrees then try again and it should scan.”
Laser scanners cannot scan barcodes off of phone screens.
Solving all the issues of laser scanners are image scanners which don’t rely on reflectivity differences.
That would be really cool, please do! I really enjoyed this article which went into enough depth to make it super interesting. Thanks!
How they work and what excellent bits we can harvest to hack with, all those high-grade optics flying round must be good for something, homebrew LIDAR or suchlike?
I’d really like that! And I thoroughly enjoyed this one, thanks.
Predating this was the system for tracking railroad freight cars. http://clarencegooden.com/index.php/2015/10/14/how-does-the-railroad-industry-track-all-those-freight-cars/
RFID tags. 900MHz. SDR. Please tell me there are dozens of train geeks across the country side maintaining private databases of rolling stock (much like people are building ADS-B trackers)…
That link didn’t work but this did:
http://clarencegooden.com/how-does-the-railroad-industry-track-all-those-freight-cars/
All barcodes are divided into two parts by three markers. Why were the three markers, start, middle and end chosen to be 6, 6 and 6 ? (Just thinking about a crazy biblical rant in an old British 1993 Mike Leigh film called “Naked”).
Not all barcodes – but all UPC-A. It’s a total coincidence (though that would be a great easter egg). George Laurer himself explains it on his webpage: http://204.13.85.155/laurergj/upc/666quest.html
The UPC code of the beast?
Brilliant film! And David Thewlis is a great actor. He’s like Gary Oldman, in that he can play different roles and you don’t always know it’s him. The opposite of, say, Arnie, where they’re basically playing one part in all the films.
In “Naked” he spends the whole film going off on bizarre rants and alienating everybody. Bit bleak. But great!
I do remember a Vice documentary about an old lady living in the Russian Tundra, She said that the mark of the beast was the barcode. Here is the link:
“One of the more peculiar notions she’s picked up” — from the Old Believer newspapers that visitors occasionally leave behind — “is that bar codes are marks of the devil. ‘It’s the stamp of the Antichrist,’ she said. ‘People bring me bags of seeds with bar codes on them. I take the seeds out and burn the bags right away and then plant the seeds. The Antichrist stamp will bring the end to the world,’ she said. ‘God won’t save everyone.’ ”
http://www.vice.com/video/agafias-taiga-life-part-1
Tin Foil hat secured.
There’s no 6,6,6 in there. The left and right guards have the pattern 1,1,1 (bar, space, bar, each one module wide) and both mark where the barcode begins and ends and provide a reference for the size of a single module. The center guard’s pattern is 1, 1, 1, 1, 1 (space, bar, space, bar, space) and separates the first half of the barcode from the second half.
The guard bars in the center also have the effect (in addition to the reversing of the colours) of reversing the parity of the symbols in the left half from the parity of the symbols in the right half, while maintaining the overall pattern of alternating bars and spaces. In the left half all symbols have odd parity (an odd number of modules are black) while on the right side the parity is even, giving a linear-scanning reader a quick way of determining whether it is scanning left to right or right to left.
EAN (now officially GTIN-13, while UPC-A is now GTIN-12) codes as used in most places outside of North America manipulate the parity of the symbols in the left half to encode an extra digit in the same number of bars and spaces. Where a UPC has all odd parity on the left and all even on the right (OOOOOOEEEEEE), EAN uses both odd and even on the left (using two distinct patterns for each of the digits 0-9), or XXXXXXEEEEEE where each X may be either O or E. If you treat O as 0 and E as 1, you have a six-bit binary number that can be used to encode the extra digit. A parity of 000000 on the left tells the scanner it is scanning a UPC instead of an EAN, and a leading 0 on an EAN is reserved for products with UPCs.
UPC/GTIN-12/13 is unlikely the most commonly-used barcode as it us used only in the retail industry, and only for certain products (a customer of mine once pointed out that you can’t barcode a fish), but for most people it is the most visible. Industry tends more towards 3 of 9 (AKA Code 39) and Code 128 because both can encode letters as well as digits. Interleaved 2 of 5 (AKA ITF) is commonly used to print UPC barcodes (UPC Case Codes originally, but now tending more towards GTIN-14) on cardboard cases of product because it is more tolerant of variable width bars due to the ink bleeding on the lower quality packaging used for shipping product.
Barcodes are simply awesome. I use them to catalog & track my collections and the stuff i sell.
unfortunately with the proliferation of free bar code generators it is mindbogglingly simple to rip stores off… print out bar code labels of smaller sized packages of products and stick them on larger/pricier ones and you’re done – heck copy the bar code from a $5-10 “jewelcase software” title and affix to the $60 gta-v and take it through the self-check at walmart. (just be sure it is a title that scans in with a description of “software, not a specific title)
That’s stealing isn’t it.
yes, it would be – I was just pointing out a weakness of current bar codes, not actually advocating someone SHOULD actually do it
If you print a sticker that is identical to one used by store, you can claim it’s store employee who made mistake. Be sure no camera can record you while you perform this illegal act of theft.
I’m not sure if I remember this story correctly, because I read it some years ago. Wallmart (I think) tested programmable RFID tags for automatic telling at self-check. Machine scanned all products in shopping cart at once, greatly simplifying the whole process. They didn’t implement it because some hackers made a transmitter that could reprogram every tag in range to read as any cheap product.
So what if you can blame a store employee? It’s still theft, it’s still wrong, and it’s still illegal.
Yeah, RFID was promised a long time ago as a replacement, but hackable or not, the truth is that it is more expensive than the practically free bar code. Even if it costs $0.1, that adds up to a lot of money if you put it on every product.
The other problem is that putting RFID tags on everything means everyone becomes trackable by the tags they carry around.
You start to broadcast your purchasing habits to anyone who bothers to point an antenna at your grocery bag, and the tags on your clothes tell would-be robbers about who’s wearing Prada and who is trucking around in plain old dollar-store clothes. Go around the neighborhood, point a yagi at a house to see if there’s anything to steal.
I’d expect stores to not have any ‘free’ codes in their active product database that can be co-opted for such use. ‘Course some stores may simply subscribe to the UPC code database and not do any pruning of it down.
The same problem exists with regular price tags. The only way to combat this would be to have unique barcodes for each copy of the software. It would be a massive database maintenance issue, but not impossible. Stores already do this for gift cards. Many electronics stores scan the serial number of the product for warranty purposeless as well as the UPC.
I had to get my head wrapped around barcodes to do an electronic component tracking system.. I open-sourced it – https://github.com/cogwheelcircuitworks/barcodatron
While most of the newer 2D barcode systems work pretty well, I really like the fact that bokodes can be read with pretty much any camera from up to 4m away, and even can tell the position and angle of {whatever} to within a single degree.
Bookland
For more detail on CCD based scanners and there performance, Motorola has a good scanner configuration utility called 123Scan2. When using the utility along with a 2D CCD scanner, you can capture the images from the CCD camera for analysis.
+1 for this utility, have used it for a number of projects.
“3,2,1,1 – meaning a space 3 modules wide, followed by a bar 2 modules wide, then another space and bar two modules wide each” I guess that should be “one module wide each” or am I understanding it wrong? Otherwise a nice glimpse to the omnipresent technology.
“digitized, decoded to digits,” – isn’t that what ‘digitized’ means?
No, it’s digitised and then those digits are decoded to other digits.