Plenty of people don’t bother to read the current newspaper, let alone editions that were published over 100 years ago. But there’s a wealth of important historical information buried in these dusty old publications, assuming you can find a way to reliably digitize and index it all. You might think the solution is as simple as running images of the paper through optical character recognition (OCR) software, but as [John Scancella] explains, the problem is a bit more complicated than that.
Ultimately, the issue largely comes down to formatting. The OCR software reasonably assumes all the text is in orderly horizontal lines, because in the vast majority of cases, it would be. That’s how you’re reading these words now. But as anyone who’s seen an old time newspaper knows, that’s not how things were necessarily written back then. Pages consisted of multiple narrow columns of stories separated by vertical lines; if the OCR tries to read the page from left to right, the resulting text is a mishmash of several unrelated topics.
The answer is to break all those articles into their own images, but doing that manually at any sort of scale simply isn’t an option. So [John] has been working on a system that uses OpenCV to identify the columns of text and isolate them. He details the multi-step process down in his write-up, and even provides the Python code should you want to give it a spin. But the short version is that the image is converted to grayscale and the OpenCV dilate function is used to stretch the text in the Y dimension. This produces big blobs of white that can easily be picked out with findContours() and snipped into individual images.
It’s not a perfect solution, and there are still a few pitfalls. For one, the name of the paper needs to be removed from the front page before the stretching operation happens. But it’s clearly a step in the right direction, and the results certainly look very promising. Anything that makes OCR more accurate or easier to implement is a win in our book, so we’re excited to see where [John] takes this concept.
Working on embedded systems used to be easier. You had a microcontroller and maybe a few pieces of analog or digital I/O, and perhaps communications might be a serial port. Today, you have systems with networks and cameras and a host of I/O. Cameras are strange because sometimes you just want an image and sometimes you want to understand the image in some way. If understanding the image involves reading text in the picture, you will want to check out EasyOCR.
The Python library leverages other open source libraries and supports 42 different languages. As the name implies, using it is pretty easy. Here’s the setup:
The results include four points that define the bounding box of each piece of text, the text, and a confidence level. The code takes advantage of the GPU, but you can run it in a CPU-only mode if you prefer. There are a few other options, including setting the algorithm’s scanning behavior, how it handles multiple processors, and how it converts the image to grayscale. The results look impressive.
According to the project’s repository, they incorporated several existing neural network algorithms and conventional algorithms, so if you want to dig into details, there are links provided to both code and white papers. If you need some inspiration for what to do with OCR, maybe this past project will give you some ideas. Or you could cheat at games.
Hackaday Editors Mike Szczys and Elliot Williams get caught up on the most interesting hacks of the past week. On this episode we take a deep dive into radiation-monitor projects, both Geiger tube and scintillator based, as well as LED cube projects that pack pixels onto six PCBs with parts counts reaching into the tens of thousands. In the 3D printing world we want non-planar printing to be the next big thing. Padauk microcontrollers are small, cheap, and do things in really interesting ways if you don’t mind embracing the ecosystem. And what’s the best way to read a water meter with a microcontroller?
Take a look at the links below if you want to follow along, and as always tell us what you think about this episode in the comments!
In our info-obsessed culture, hackers are increasingly interested in ways to quantify the world around them. One popular project is to collect data about their home energy or water consumption to try and identify any trends or potential inefficiencies. For safety and potentially legal reasons, this usually has to be done in a minimally invasive way that doesn’t compromise the metering done by the utility provider. As you might expect, that often leads to some creative methods of data collection.
Of course, the ESP8266 is not a platform we generally see performing optical character recognition. Some clever programming was required to get the Wemos D1 Mini Lite to reliably read the numbers from the meter without having to push the task to a more computationally powerful device such as a Raspberry Pi. The process starts with a 160×120 JPEG image provided by a VC0706 camera module, which is then processed with the JPEGDecoder library. The top and bottom of the image are discarded, and the center band is isolated into blocks that correspond with the position of each digit on the display.
Within each block, the code checks an array of predetermined points to see if the corresponding pixel is black or not. In theory this allows detecting all the digits between 0 and 9, though [Keilin] says there were still the occasional false readings due to inherent instabilities in the camera and mounting. But with a few iterations to the code and the aid of a Python testing program that allowed him to validate the impact of changes to the algorithm, he was able to greatly improve the detection accuracy. He says it also helps that the nature of the data allows for some basic sanity checks; for example the number only ever goes up, and only by a relatively small amount each time.
Every once in a while a project comes along with that magical power to consume your time and attention for many months. When you finally complete it, you feel sorry that you don’t have to do anything more.
What is so special about this Bingo ball reader? It may seem like an ordinary OCR project at first glance; a camera captures the image and OCR software recognizes the number. Simple as that. And it works without problems, like every simple gadget should.
But then again, maybe it’s not that simple. Numbers are scattered all over the ball, so they have to be located first, and the best candidate for reading must be selected. Then, numbers are painted onto a sphere rather than a flat surface, sometimes making them deformed to the point where their shape has to be recovered first. Also, the angle of reading is not fixed but somewhere on a 360° scale. And then we have the glare problem to boot, as Bingo balls are so shiny that every light source reflects as a saturated bright spot.
So, is that all of it? Well, almost. The task is supposed to be performed by an embedded microcontroller, with limited speed and memory, yet the recognition process for one ball has to be fast — 500 ms at worst. But that’s just one part of the process. The project includes the pipelined mechanism which accepts the ball, transports it to be scanned by the OCR and then shot by the public broadcast camera before it gets dumped. And finally, if the reading was not reliable enough, the ball has to be subtly rotated so that the numbers would be repositioned for another reading attempt.
Despite these challenges I did manage to build this system. It’s fast and reliable, and I discovered some very interesting tricks along the way. Take a look at the quick demo video below to get a feel for the speed, and what the system “sees”. Then join me after the break to dive into the details of this interesting embedded build.