Ask Hackaday: How Do You Digitize Your Documents?

Like many of you, I have a hard time getting rid of stuff. I’ve got boxes and boxes of weirdo bits and bobs, and piles of devices that I’ll eventually get around to stripping down into even more bits and bobs. Despite regular purges — I try to bring a car-load of crap treasure to local hackerspaces and meetups at least a couple times a year — the pile only continues to grow.

But the problem isn’t limited to hardware components. There’s all sorts of things that the logical part of me understands I’ll almost certainly never need, and yet I can’t bring myself to dispose of. One of those things just so happens to be documents. Anything printed is fair game. Could be the notes from my last appointment with the doctor, or fliers for events I attended years ago. Doesn’t matter, the stacks keep building up until I end up cramming it all into a box and start the whole process starts over again.

I’ve largely convinced myself that the perennial accumulation of electronic bric-à-brac is an occupational hazard, and have come to terms with it. But I think there’s a good chance of moving the needle on the document situation, and if that involves a bit of high-tech overengineering, even better. As such, I’ve spent the last couple of weeks investigating digitizing the documents that have information worth retaining so that the originals can be sent along to Valhalla in my fire pit.

The following represents some of my observations thus far, in the hopes that others going down a similar path may find them useful. But what I’m really interested in is hearing from the Hackaday community. Surely I’m not the only one trying to save some storage space by turn piles of papers into ones and zeros.

Continue reading “Ask Hackaday: How Do You Digitize Your Documents?”

Retrotechtacular: IBM’s The World Of OCR

Optical Character Recognition (OCR) forms the bridge between the analog world of paper and the world of machines. The modern-day expectation is that when we point a smartphone camera at some characters it will flawlessly recognize and read them, but OCR technology predates such consumer technology by a considerable amount, with IBM producing OCR systems as early as the 1950s. In a 1960s promotional video on the always delightful Periscope Film channel on YouTube we can get an idea of how this worked back then, in particular the challenge of variable quality input.

What drove OCR was the need to process more paper-based data faster, as the amount of such data increased and computers got more capable. This led to the design of paper forms that made the recognition much easier, as can still be seen today on for example tax forms and on archaic paper payment methods like checks in countries that still use it. This means a paper form optimized for reflectivity, with clearly designated sections and lines, thus limiting the variability of the input forms to be OCR-ed. After that it’s just a matter of writing with clear block letters into the marked boxes, or using a typewriter with a nice fresh ink ribbon.

These days optical scanners are a lot more capable, of course, making many of such considerations no longer as relevant, even if human handwriting remains a challenge for OCR and human brains alike.

Continue reading “Retrotechtacular: IBM’s The World Of OCR”

“Glasses” That Transcribe Text To Audio

Glasses for the blind might sound like an odd idea, given the traditional purpose of glasses and the issue of vision impairment. However, eighth-grade student [Akhil Nagori] built these glasses with an alternate purpose in mind. They’re not really for seeing. Instead, they’re outfitted with hardware to capture text and read it aloud.

Yes, we’re talking about real-time text-to-audio transcription, built into a head-worn format. The hardware is pretty straightforward: a Raspberry Pi Zero 2W runs off a battery and is outfitted with the usual first-party camera. The camera is mounted on a set of eyeglass frames so that it points at whatever the wearer might be “looking” at. At the push of a button, the camera captures an image, and then passes it to an API which does the optical character recognition. The text can then be passed to a speech synthesizer so it can be read aloud to the wearer.

It’s funny to think about how advanced this project really is. Jump back to the dawn of the microcomputer era, and such a device would have been a total flight of fancy—something a researcher might make a PhD and career out of. Indeed, OCR and speech synthesis alone were challenge enough. Today, you can stand on the shoulders of giants and include such mighty capability in a homebrewed device that cost less than $50 to assemble. It’s a neat project, too, and one that we’re sure taught [Akhil] many valuable skills along the way.

Continue reading ““Glasses” That Transcribe Text To Audio”

Scanning Receipts Proves Trickier Than Anticipated

It’s one of those things that certainly sounds simple enough: take a picture of a receipt, run it through optical character recognition (OCR), and send the resulting information to whatever expense-tracking website or software you wish. There are companies that offer such a service, so it can’t be too difficult to replicate on your own…right?

That’s what [Marcel Robitaille] thought when he set out to create his homebrew “Receipt Ingestion” system, anyway. But in reality it took so much time to troubleshoot and implement that he says it would have been faster to just enter in all his receipts by hand. We’re happy he stuck with it though, otherwise you wouldn’t be reading about it on Hackaday, and we wouldn’t be able to learn anything from the detailed account he’s provided.

It only took an evening to hack together a rough demo, and the initial results were very promising. The code could detect the edges of the receipt, rotate the captured image appropriately, and then pull out the critical information such as date, total amount, business name, etc. He was then able to decipher the API for Splitwise, an online service for splitting bills, by capturing the data sent by his browser while adding a new bill. With this information, writing up some Python code to push his captured data into the service was trivial. So far, so good.

Using a QR code as reference point.

But like so many horror films that begin with a happy family starting a new life in a beautiful home, there was a monster lurking in the shadows. It’s one thing to capture data from perfectly clean and flat receipts, but quite another to get any useful info out of one that spent half the day crumpled up in your back pocket. The promising proof of concept that worked a treat under controlled conditions failed completely in the real-world, with [Marcel] reporting that only 1 in 5 receipts he tried to scan actually went through.

In the end, [Marcel] realized that the best way to handle the unreliable condition of the receipts was to focus on a different object in the image. He came up with a QR code marker that he could put on the table with the receipt to be scanned, which his software can use as a known point of reference. This greatly improves the reliability of the image rotation and transformation, which in turn makes the OCR more reliable. It also makes it much easier to tell which images need to be scanned — if there’s no QR code found, the software just skips that shot and keeps looking.

The unique challenges of digitizing large amounts of printed content using OCR makes for some fascinating problem solving, and we’re glad [Marcel] shared this particular story with us. While there’s still some edge cases that need chasing down, he’s using the software on a nearly daily basis, and has posted it up on GitHub for anyone who might wish to build on his efforts.

OCR Reads Old Newspapers So We Don’t Have To

Plenty of people don’t bother to read the current newspaper, let alone editions that were published over 100 years ago. But there’s a wealth of important historical information buried in these dusty old publications, assuming you can find a way to reliably digitize and index it all. You might think the solution is as simple as running images of the paper through optical character recognition (OCR) software, but as [John Scancella] explains, the problem is a bit more complicated than that.

Stretching the text vertically highlights the columns.

Ultimately, the issue largely comes down to formatting. The OCR software reasonably assumes all the text is in orderly horizontal lines, because in the vast majority of cases, it would be. That’s how you’re reading these words now. But as anyone who’s seen an old time newspaper knows, that’s not how things were necessarily written back then. Pages consisted of multiple narrow columns of stories separated by vertical lines; if the OCR tries to read the page from left to right, the resulting text is a mishmash of several unrelated topics.

The answer is to break all those articles into their own images, but doing that manually at any sort of scale simply isn’t an option. So [John] has been working on a system that uses OpenCV to identify the columns of text and isolate them. He details the multi-step process down in his write-up, and even provides the Python code should you want to give it a spin. But the short version is that the image is converted to grayscale and the OpenCV dilate function is used to stretch the text in the Y dimension. This produces big blobs of white that can easily be picked out with findContours() and snipped into individual images.

It’s not a perfect solution, and there are still a few pitfalls. For one, the name of the paper needs to be removed from the front page before the stretching operation happens. But it’s clearly a step in the right direction, and the results certainly look very promising. Anything that makes OCR more accurate or easier to implement is a win in our book, so we’re excited to see where [John] takes this concept.

EasyOCR Makes OCR, Well, Easy

Working on embedded systems used to be easier. You had a microcontroller and maybe a few pieces of analog or digital I/O, and perhaps communications might be a serial port. Today, you have systems with networks and cameras and a host of I/O. Cameras are strange because sometimes you just want an image and sometimes you want to understand the image in some way. If understanding the image involves reading text in the picture, you will want to check out EasyOCR.

The Python library leverages other open source libraries and supports 42 different languages. As the name implies, using it is pretty easy. Here’s the setup:


import easyocr
reader = easyocr.Reader(['th','en'])
reader.readtext('test.jpg')

The results include four points that define the bounding box of each piece of text, the text, and a confidence level. The code takes advantage of the GPU, but you can run it in a CPU-only mode if you prefer. There are a few other options, including setting the algorithm’s scanning behavior, how it handles multiple processors, and how it converts the image to grayscale. The results look impressive.

According to the project’s repository, they incorporated several existing neural network algorithms and conventional algorithms, so if you want to dig into details, there are links provided to both code and white papers. If you need some inspiration for what to do with OCR, maybe this past project will give you some ideas. Or you could cheat at games.

Hackaday Podcast 035: LED Cubes Taking Over, Ada Vanquishes C Bugs, Rad Monitoring Is Hot, And 3D Printing Goes Full 3D

Hackaday Editors Mike Szczys and Elliot Williams get caught up on the most interesting hacks of the past week. On this episode we take a deep dive into radiation-monitor projects, both Geiger tube and scintillator based, as well as LED cube projects that pack pixels onto six PCBs with parts counts reaching into the tens of thousands. In the 3D printing world we want non-planar printing to be the next big thing. Padauk microcontrollers are small, cheap, and do things in really interesting ways if you don’t mind embracing the ecosystem. And what’s the best way to read a water meter with a microcontroller?

Take a look at the links below if you want to follow along, and as always tell us what you think about this episode in the comments!

Take a look at the links below if you want to follow along, and as always, tell us what you think about this episode in the comments!

Direct download (60 MB or so.)

Continue reading “Hackaday Podcast 035: LED Cubes Taking Over, Ada Vanquishes C Bugs, Rad Monitoring Is Hot, And 3D Printing Goes Full 3D”