“Glasses” That Transcribe Text To Audio

March 19, 2025 by Lewin Day 11 Comments

Glasses for the blind might sound like an odd idea, given the traditional purpose of glasses and the issue of vision impairment. However, eighth-grade student [Akhil Nagori] built these glasses with an alternate purpose in mind. They’re not really for seeing. Instead, they’re outfitted with hardware to capture text and read it aloud.

Yes, we’re talking about real-time text-to-audio transcription, built into a head-worn format. The hardware is pretty straightforward: a Raspberry Pi Zero 2W runs off a battery and is outfitted with the usual first-party camera. The camera is mounted on a set of eyeglass frames so that it points at whatever the wearer might be “looking” at. At the push of a button, the camera captures an image, and then passes it to an API which does the optical character recognition. The text can then be passed to a speech synthesizer so it can be read aloud to the wearer.

It’s funny to think about how advanced this project really is. Jump back to the dawn of the microcomputer era, and such a device would have been a total flight of fancy—something a researcher might make a PhD and career out of. Indeed, OCR and speech synthesis alone were challenge enough. Today, you can stand on the shoulders of giants and include such mighty capability in a homebrewed device that cost less than $50 to assemble. It’s a neat project, too, and one that we’re sure taught [Akhil] many valuable skills along the way.

Continue reading ““Glasses” That Transcribe Text To Audio” →

Scanning Receipts Proves Trickier Than Anticipated

April 21, 2022 by Tom Nardi 25 Comments

It’s one of those things that certainly sounds simple enough: take a picture of a receipt, run it through optical character recognition (OCR), and send the resulting information to whatever expense-tracking website or software you wish. There are companies that offer such a service, so it can’t be too difficult to replicate on your own…right?

That’s what [Marcel Robitaille] thought when he set out to create his homebrew “Receipt Ingestion” system, anyway. But in reality it took so much time to troubleshoot and implement that he says it would have been faster to just enter in all his receipts by hand. We’re happy he stuck with it though, otherwise you wouldn’t be reading about it on Hackaday, and we wouldn’t be able to learn anything from the detailed account he’s provided.

It only took an evening to hack together a rough demo, and the initial results were very promising. The code could detect the edges of the receipt, rotate the captured image appropriately, and then pull out the critical information such as date, total amount, business name, etc. He was then able to decipher the API for Splitwise, an online service for splitting bills, by capturing the data sent by his browser while adding a new bill. With this information, writing up some Python code to push his captured data into the service was trivial. So far, so good.

But like so many horror films that begin with a happy family starting a new life in a beautiful home, there was a monster lurking in the shadows. It’s one thing to capture data from perfectly clean and flat receipts, but quite another to get any useful info out of one that spent half the day crumpled up in your back pocket. The promising proof of concept that worked a treat under controlled conditions failed completely in the real-world, with [Marcel] reporting that only 1 in 5 receipts he tried to scan actually went through.

In the end, [Marcel] realized that the best way to handle the unreliable condition of the receipts was to focus on a different object in the image. He came up with a QR code marker that he could put on the table with the receipt to be scanned, which his software can use as a known point of reference. This greatly improves the reliability of the image rotation and transformation, which in turn makes the OCR more reliable. It also makes it much easier to tell which images need to be scanned — if there’s no QR code found, the software just skips that shot and keeps looking.

The unique challenges of digitizing large amounts of printed content using OCR makes for some fascinating problem solving, and we’re glad [Marcel] shared this particular story with us. While there’s still some edge cases that need chasing down, he’s using the software on a nearly daily basis, and has posted it up on GitHub for anyone who might wish to build on his efforts.

OCR Reads Old Newspapers So We Don’t Have To

August 30, 2020 by Tom Nardi 18 Comments

Plenty of people don’t bother to read the current newspaper, let alone editions that were published over 100 years ago. But there’s a wealth of important historical information buried in these dusty old publications, assuming you can find a way to reliably digitize and index it all. You might think the solution is as simple as running images of the paper through optical character recognition (OCR) software, but as [John Scancella] explains, the problem is a bit more complicated than that.

Stretching the text vertically highlights the columns.

Ultimately, the issue largely comes down to formatting. The OCR software reasonably assumes all the text is in orderly horizontal lines, because in the vast majority of cases, it would be. That’s how you’re reading these words now. But as anyone who’s seen an old time newspaper knows, that’s not how things were necessarily written back then. Pages consisted of multiple narrow columns of stories separated by vertical lines; if the OCR tries to read the page from left to right, the resulting text is a mishmash of several unrelated topics.

The answer is to break all those articles into their own images, but doing that manually at any sort of scale simply isn’t an option. So [John] has been working on a system that uses OpenCV to identify the columns of text and isolate them. He details the multi-step process down in his write-up, and even provides the Python code should you want to give it a spin. But the short version is that the image is converted to grayscale and the OpenCV dilate function is used to stretch the text in the Y dimension. This produces big blobs of white that can easily be picked out with findContours() and snipped into individual images.

It’s not a perfect solution, and there are still a few pitfalls. For one, the name of the paper needs to be removed from the front page before the stretching operation happens. But it’s clearly a step in the right direction, and the results certainly look very promising. Anything that makes OCR more accurate or easier to implement is a win in our book, so we’re excited to see where [John] takes this concept.

EasyOCR Makes OCR, Well, Easy

July 9, 2020 by Al Williams 21 Comments

Working on embedded systems used to be easier. You had a microcontroller and maybe a few pieces of analog or digital I/O, and perhaps communications might be a serial port. Today, you have systems with networks and cameras and a host of I/O. Cameras are strange because sometimes you just want an image and sometimes you want to understand the image in some way. If understanding the image involves reading text in the picture, you will want to check out EasyOCR.

The Python library leverages other open source libraries and supports 42 different languages. As the name implies, using it is pretty easy. Here’s the setup:


import easyocr
reader = easyocr.Reader(['th','en'])
reader.readtext('test.jpg')

The results include four points that define the bounding box of each piece of text, the text, and a confidence level. The code takes advantage of the GPU, but you can run it in a CPU-only mode if you prefer. There are a few other options, including setting the algorithm’s scanning behavior, how it handles multiple processors, and how it converts the image to grayscale. The results look impressive.

According to the project’s repository, they incorporated several existing neural network algorithms and conventional algorithms, so if you want to dig into details, there are links provided to both code and white papers. If you need some inspiration for what to do with OCR, maybe this past project will give you some ideas. Or you could cheat at games.

Hackaday Podcast 035: LED Cubes Taking Over, Ada Vanquishes C Bugs, Rad Monitoring Is Hot, And 3D Printing Goes Full 3D

September 13, 2019 by Mike Szczys No comments

Hackaday Editors Mike Szczys and Elliot Williams get caught up on the most interesting hacks of the past week. On this episode we take a deep dive into radiation-monitor projects, both Geiger tube and scintillator based, as well as LED cube projects that pack pixels onto six PCBs with parts counts reaching into the tens of thousands. In the 3D printing world we want non-planar printing to be the next big thing. Padauk microcontrollers are small, cheap, and do things in really interesting ways if you don’t mind embracing the ecosystem. And what’s the best way to read a water meter with a microcontroller?

Take a look at the links below if you want to follow along, and as always tell us what you think about this episode in the comments!

Take a look at the links below if you want to follow along, and as always, tell us what you think about this episode in the comments!

Direct download (60 MB or so.)

Continue reading “Hackaday Podcast 035: LED Cubes Taking Over, Ada Vanquishes C Bugs, Rad Monitoring Is Hot, And 3D Printing Goes Full 3D” →

Reading The Water Meter In A Literal Sense With An ESP8266

September 6, 2019 by Tom Nardi 15 Comments

In our info-obsessed culture, hackers are increasingly interested in ways to quantify the world around them. One popular project is to collect data about their home energy or water consumption to try and identify any trends or potential inefficiencies. For safety and potentially legal reasons, this usually has to be done in a minimally invasive way that doesn’t compromise the metering done by the utility provider. As you might expect, that often leads to some creative methods of data collection.

The latest solution comes courtesy of [Keilin Bickar], who’s using the ESP8266 and a serial TTL camera module to read the characters from the LCD of his water meter. With a 3D printed enclosure that doubles as a light source for the camera, the finished device perches on top of the water meter and sends the current reading to HomeAssistant via MQTT without any permanent wiring or mounting.

Of course, the ESP8266 is not a platform we generally see performing optical character recognition. Some clever programming was required to get the Wemos D1 Mini Lite to reliably read the numbers from the meter without having to push the task to a more computationally powerful device such as a Raspberry Pi. The process starts with a 160×120 JPEG image provided by a VC0706 camera module, which is then processed with the JPEGDecoder library. The top and bottom of the image are discarded, and the center band is isolated into blocks that correspond with the position of each digit on the display.

Within each block, the code checks an array of predetermined points to see if the corresponding pixel is black or not. In theory this allows detecting all the digits between 0 and 9, though [Keilin] says there were still the occasional false readings due to inherent instabilities in the camera and mounting. But with a few iterations to the code and the aid of a Python testing program that allowed him to validate the impact of changes to the algorithm, he was able to greatly improve the detection accuracy. He says it also helps that the nature of the data allows for some basic sanity checks; for example the number only ever goes up, and only by a relatively small amount each time.

This method might not allow the per-second sampling required to pull off the impressive (if slightly creepy) water usage data mining we saw recently, but as long as you’re not after very high resolution data this is an elegant and creative way to pull useful data from your existing utility meter.

Reading Bingo Balls With Microcontrollers

June 7, 2017 by Voja Antonic 38 Comments

Every once in a while a project comes along with that magical power to consume your time and attention for many months. When you finally complete it, you feel sorry that you don’t have to do anything more.

What is so special about this Bingo ball reader? It may seem like an ordinary OCR project at first glance; a camera captures the image and OCR software recognizes the number. Simple as that. And it works without problems, like every simple gadget should.

But then again, maybe it’s not that simple. Numbers are scattered all over the ball, so they have to be located first, and the best candidate for reading must be selected. Then, numbers are painted onto a sphere rather than a flat surface, sometimes making them deformed to the point where their shape has to be recovered first. Also, the angle of reading is not fixed but somewhere on a 360° scale. And then we have the glare problem to boot, as Bingo balls are so shiny that every light source reflects as a saturated bright spot.

So, is that all of it? Well, almost. The task is supposed to be performed by an embedded microcontroller, with limited speed and memory, yet the recognition process for one ball has to be fast — 500 ms at worst. But that’s just one part of the process. The project includes the pipelined mechanism which accepts the ball, transports it to be scanned by the OCR and then shot by the public broadcast camera before it gets dumped. And finally, if the reading was not reliable enough, the ball has to be subtly rotated so that the numbers would be repositioned for another reading attempt.

Despite these challenges I did manage to build this system. It’s fast and reliable, and I discovered some very interesting tricks along the way. Take a look at the quick demo video below to get a feel for the speed, and what the system “sees”. Then join me after the break to dive into the details of this interesting embedded build.