It’s one of those things that certainly sounds simple enough: take a picture of a receipt, run it through optical character recognition (OCR), and send the resulting information to whatever expense-tracking website or software you wish. There are companies that offer such a service, so it can’t be too difficult to replicate on your own…right?
That’s what [Marcel Robitaille] thought when he set out to create his homebrew “Receipt Ingestion” system, anyway. But in reality it took so much time to troubleshoot and implement that he says it would have been faster to just enter in all his receipts by hand. We’re happy he stuck with it though, otherwise you wouldn’t be reading about it on Hackaday, and we wouldn’t be able to learn anything from the detailed account he’s provided.
It only took an evening to hack together a rough demo, and the initial results were very promising. The code could detect the edges of the receipt, rotate the captured image appropriately, and then pull out the critical information such as date, total amount, business name, etc. He was then able to decipher the API for Splitwise, an online service for splitting bills, by capturing the data sent by his browser while adding a new bill. With this information, writing up some Python code to push his captured data into the service was trivial. So far, so good.
But like so many horror films that begin with a happy family starting a new life in a beautiful home, there was a monster lurking in the shadows. It’s one thing to capture data from perfectly clean and flat receipts, but quite another to get any useful info out of one that spent half the day crumpled up in your back pocket. The promising proof of concept that worked a treat under controlled conditions failed completely in the real-world, with [Marcel] reporting that only 1 in 5 receipts he tried to scan actually went through.
In the end, [Marcel] realized that the best way to handle the unreliable condition of the receipts was to focus on a different object in the image. He came up with a QR code marker that he could put on the table with the receipt to be scanned, which his software can use as a known point of reference. This greatly improves the reliability of the image rotation and transformation, which in turn makes the OCR more reliable. It also makes it much easier to tell which images need to be scanned — if there’s no QR code found, the software just skips that shot and keeps looking.
The unique challenges of digitizing large amounts of printed content using OCR makes for some fascinating problem solving, and we’re glad [Marcel] shared this particular story with us. While there’s still some edge cases that need chasing down, he’s using the software on a nearly daily basis, and has posted it up on GitHub for anyone who might wish to build on his efforts.
25 thoughts on “Scanning Receipts Proves Trickier Than Anticipated”
The qr code reads, “ingest me”, if anyone was interested.
I wasn’t, but I am happy to have learned it!
I have been happy to not have been Rickrolled this time.
Crumpled receipts etc are still better scanned on a real scanner. Fuji snapscan is amazing.
But if you’re doing it with phone photos, the trick is to scan the receipt immediately, before crumpling it into a pocket.
@Dan said: “Crumpled receipts etc are still better scanned on a real scanner. Fuji snapscan is amazing.”
* It’s Fuji ScanSnap, not snapscan:
* ScanSnap scanners can be pretty amazing alrite, price wise:
* Epson specializes in receipt scanners:
* Or if you don’t mind being spied on, there are plenty of smartphone receipt scanner apps. Many will flatten and square-up the scans, even if they are slightly crumpled:
Thanks for your feedback. I tried that. Unfortunately, my scanner can’t scan long receipts, which is why I use a cellphone pictures. There’s a section on it in the full blog post.
Avoid CVS my friend.
There was or still is a company that provides a commercial service doing this for people. Last I heard, a few years ago, they gave up on OCR and hired people to read the receipts from photographs and enter the data manually. Kudos to the Marcel for pulling this off.
My receipts are the pocket crumbled variety.
I run mine all through a ff-680W, create a pdf, use my pdf readers ocr ability, then save as text. Very close to 100% and can handle receipts a meter long (and I do get ones that long at times)..
Why tax the software accuracy with a distant crooked out of plane badly lit sample? It’s shot like a spy in a non revealing posture. You should know the distance your camera will focus to up close with good light and it will always give sharp focus. Most phones get screen filling with the piece shown in focus.
Those photos were just to show the perspective transformation. Of course, when using the software for real, I try to take the image as straight as possible. If I used that image for the blog post, it would be hard to tell what the transform is doing because the image would hardly change.
I believe the point was to have something that could function even in the worst case scenario so that it would work consistently in the best case.
I have a home brew receipt scanning setup that’s evolved over 20 years from just simple PDF to PDF, multiple OCR scans after different image processing options, pattern recognition for categorization and picking the best OCR, data extraction, etc.
The next frontier is image recognition for certain company logos and to better support Asian languages. (I travel, too.)
That sounds very interesting. Is it open source at all?
I agree. I don’t try to recognize the logo at all, but usually the company is also embedded in the receipt text as well. I also added some tricks or heuristics in case that part is not recognized properly. For example, if a receipt text contains “don’t drink and drive”, then it is probably from the liquor store.
I also had to deal with languages, even though I haven’t traveled for a few years. Turns out if you pay with a German credit card, even in Canada, you will *sometimes* get a German receipt.
Expense tracking… website?
Good grief :-)
It’s to track shared expenses between friends or partners, so it makes sense to be a hosted solution. Actually, it is primarily an app, which is even worse. I did not chose the service, just trying to make using it less painful. :-)
How about having a glass and pressing the receipt to the table with it?
i need this in linux and offline without serwer outside
Thanks for your suggestion. I tried that actually. The problem was that the glass causes reflections and washes out parts of the image. I tried many things that did not make it into the blog post, actually.
The code is on GitHub. You can take out the last part about uploading to splitwise.com and do whatever you want with the data.
Have you tried adding a circular polarizer on the camera? From my testing, It’ll significantly reduce refractions
Imagine if it became common practice for receipt printers to include all the relevant information on a QR code. That would make this task easy!
I was hoping for the same thing the whole time I was doing this. I think we’ll paperless receipts take over before all the information is embedded in a QR code.
Given the state of IT of most retailers, a printed copy is only possible because their register software produced it. Emailing receipts is possible for some retailers who have software to do it AND who track your purchases via loyalty number or something (if they’re tracking it via your account number, that falls into a privacy “gray area”.) Making your receipts available via API is possible, but few retailers offer it.
And making the raw receipt data available via QR code is almost impossible, due to the amount of data. QR codes can hold a finite amount of data, about 3K max on a really high density array. With overhead, that’s room for about a 20 item receipt. But no receipt printer has the quality to print the high resolution QR barcode needed.
Thermal receipt printers work by using a linear array of heating elements – as the paper moves past the print head, an element is turned on-then-off to print a dot. Individual heating elements burn out all the time; when they do, they leave a visible white column. A barcode can usually survive one missing element. But two burned out elements next to each other start to affect the quality of the code. And if you’ve ever looked at your receipts from just about anywhere, you’ll see that most stores don’t ever replace printers no matter how many pixels are dead.
Your best bet is for a QR code to contain a URL to an API that provides the receipt data. But these are tricky, because retailers need to restrict them so only the original purchaser can access their receipts (yes, I know the cryptographic solutions for solving that.) And again, how many retailers are investing that kind of money in producing receipts?
Finally, keep in mind that absolutely every retailer will have their own schema for receipt data. There are published standards (NRF-ARTS) for retail schemas, but “they’re more like guidelines than actual rules.”
So while it’s nice to imagine, it’s like imagining that electronic component data sheets are suddenly standardized across every manufacturer, only even less likely.
I know paper receipts aren’t going away anytime soon, but it’s still sooner than some kind of code gets inserted into the receipt with all the information. You explain why yourself.
Good stuff, Marcel. The problem solving is the thing here… There are any number of commercial solutions available to save you the “trouble” but where the fun in that?
BTW Parking garages I’ve usee recently are using QR for receipt retrieval. The ticket you get on entry has a pre printed QR on it, which gets around the limitation of thermal printer resolution of normal receipts.
This is uniquely suited to parking business because of the use of tickets at entry.
Hi Belzoni. Thanks for the feedback. I’ve never seen that. Very interesting.
Please be kind and respectful to help make the comments section excellent. (Comment Policy)