Convert Any Book To A DIY Audiobook?

July 6, 2025 by Dave Rowntree 12 Comments

If the idea of reading a physical book sounds like hard work, [Nick Bild’s] latest project, the PageParrot, might be for you. While AI gets a lot of flak these days, one thing modern multimodal models do exceptionally well is image interpretation, and PageParrot demonstrates just how accessible that’s become.

[Nick] demonstrates quite clearly how little code is needed to get from those cryptic black and white glyphs to sounds the average human can understand, specifically a paltry 80 lines of Python. Admittedly, many of those lines are pulling in libraries, and some are just blank, so functionally speaking, it’s even shorter than that. Of course, the whole application is mostly glue code, stitching together other people’s hard work, but it’s still instructive and fun to play with.

The hardware required is a Raspberry Pi Zero 2 W, a camera (in this case, a USB webcam), and something to hold it above the book. Any Pi with the ability to connect to a camera should also work, however, with just a little configuration.

On the software side, [Nick] pulls in the CV2 library (which is the interface to OpenCV) to handle the camera interfacing, programming it to full HD resolution. Google’s GenAI is used to interface the Gemini 2.5 Flash LLM via an API endpoint. This takes a captured image and a trivial prompt, and returns the whole page of text, quick as a flash.

Finally, the script hands that text over to Piper, which turns that into a speech file in WAV format. This can then be played to an audio device with a call out to the console aplay tool. It’s all very simple at this level of abstraction.

Continue reading “Convert Any Book To A DIY Audiobook?” →

Smart Glasses Read Text

October 18, 2024 by Al Williams 6 Comments

You normally think of smart glasses as something you wear as either an accessory or, if you need a little assistance, with corrective lenses. But [akhilnagori] has a different kind of smart eyewear. These glasses scan and read text in the user’s ear.

This project was inspired by a blind child who enjoyed listening to stories but could not read beyond a few braille books. The glasses perform the reading using a Raspberry Pi Zero 2 W and a machine learning algorithm.

Continue reading “Smart Glasses Read Text” →

Make Your Bookshelf Clickable

February 15, 2024 by Al Williams 21 Comments

We’ll confess that we have a fondness for real books and plenty of them. So does [James], and he decided he needed a way to take a picture of his bookshelves and make each book clickable to find more information. This is one of those things that sounds fairly simple until you decide to do it. You can try an example of the results and then go back and read about the journey it took to get there.

There are several subtasks involved. First, you want to identify each book’s envelope. It wouldn’t do to click on the Joy of Cooking and get information about Remembrance of Things Past.

The next challenge is reading the title of the book. This can be tricky. Fonts differ. The book could be upside down. Some titles go cross the spine, but most go vertically. The remainder of the task is fairly easy. If you know the region and the title, you can easily find a link (for Google Books, in this case) and build an SVG overlay that maps the areas for each book to the right link.

Continue reading “Make Your Bookshelf Clickable” →

You’ve Got Mail: Reading Addresses With OCR

September 20, 2023 by Kristina Panos 14 Comments

Last time I delivered on this column, I told you about the USPS’ attempts to fully automate a post office. Of course, that’s a bit of a misnomer, since it took 1,500 employees to actually operate the place on a daily basis. Although Project Turnkey in Rhode Island and Project Gateway in California were proving grounds for all kinds of mail sorting and processing equipment, the act of actually reading addresses and routing mail to its final destination still required human intervention and hand coding.

Today, the post office processes hundreds of millions of mail pieces each day using various pieces of equipment. One of those important pieces of equipment is the OCR address reader, which manages to make sense of all kinds of chicken scratch.

Continue reading “You’ve Got Mail: Reading Addresses With OCR” →

Immersive Cursive: Growing Up Loopy

October 6, 2022 by Kristina Panos 28 Comments

Growing up, ours was a family of handwritten notes for every occasion. The majority were left on the kitchen counter next to the sink, or in a particular spot on the all-purpose table in the breakfast nook. Whether one was professing their familial love and devotion on the back of a Valpak coupon, or simply communicating an intent to be home before dinnertime, the words were generally immortalized in BiC on whatever paper was available, and timestamped for the reader’s information. You may have learned cursive in school, but I was born in it — molded by it. The ascenders and descenders betray you because they belong to me.

Both of my parents always seemed to be incapable of printing in anything other than all caps, so I actually preferred to see their cursive most of the time. As a result, I could ~~copy~~ read it quite easily from an early age. Well, I don’t think I ever had any hope of imitating Dad’s signature. But Mom’s on the other hand — like I said in the first installment, it was important for my signature to be distinct from hers, given that we have the same name — first, middle, and last. But I could probably still bust out her signature if it came down to something going on my permanent record.

While my handwriting was sort of naturally headed towards Mom’s, I was more interested in Dad’s style and that of my older brother. He had small caps handwriting down to an art, and my attempts to copy it have always looked angry and stilted by comparison. In addition, my brother’s cursive is lovely and quick, while still being legible.

Continue reading “Immersive Cursive: Growing Up Loopy” →

Cursing The Curse Of Cursive

September 27, 2022 by Kristina Panos 114 Comments

Unlike probably most people, I enjoy the act of writing by hand — but I’ve always disliked signing my name. Why is that? I think it’s because signatures are supposed to be in cursive, or else they don’t count. At least, that’s what I was taught growing up. (And I’m really not that old, I swear!)

Having the exact same name as my mother meant that it was important to adolescent me to be different, and that included making sure our signatures looked nothing alike. Whereas her gentle, looping hand spoke to her sensitive and friendly nature, my heavy-handed block print was just another way of letting out my teen angst. Sometime in the last couple of decades, my signature became K-squiggle P-squiggle, which is really just a sped-up, screw-you version of my modern handwriting, which is a combination of print and cursive.

D’Nealian print. Notice the ‘monkey tails’ on every possible lowercase letter.

D’Nealian cursive. Notice the stroke order and the ridiculous capital Q.

But let’s back up a bit. I started learning to write in kindergarten, but that of course was in script, with separate letters. Me and my fellow Xennial zeigeistians learned a specific printing method called D’Nealian, which was designed to ease the transition from printing to cursive with its curly tails on every letter.

We practiced our D’Nealian (So fancy! So grown-up!) on something called Zaner-Bloser paper, which is still used today, and by probably second grade were making that transition from easy Zorro-like lowercase Zs to the quite mature-looking double-squiggle of the cursive version. It was as though our handwriting was moving from day to night, changing and moving as fast as we were. You’d think we would have appreciated learning a way of writing that was more like us — a blur of activity, everything connected, an oddly-modular alphabet that was supposed to serve us well in adulthood. But we didn’t. We hated it. And you probably did, too.

Continue reading “Cursing The Curse Of Cursive” →

Scanning Receipts Proves Trickier Than Anticipated

April 21, 2022 by Tom Nardi 25 Comments

It’s one of those things that certainly sounds simple enough: take a picture of a receipt, run it through optical character recognition (OCR), and send the resulting information to whatever expense-tracking website or software you wish. There are companies that offer such a service, so it can’t be too difficult to replicate on your own…right?

That’s what [Marcel Robitaille] thought when he set out to create his homebrew “Receipt Ingestion” system, anyway. But in reality it took so much time to troubleshoot and implement that he says it would have been faster to just enter in all his receipts by hand. We’re happy he stuck with it though, otherwise you wouldn’t be reading about it on Hackaday, and we wouldn’t be able to learn anything from the detailed account he’s provided.

It only took an evening to hack together a rough demo, and the initial results were very promising. The code could detect the edges of the receipt, rotate the captured image appropriately, and then pull out the critical information such as date, total amount, business name, etc. He was then able to decipher the API for Splitwise, an online service for splitting bills, by capturing the data sent by his browser while adding a new bill. With this information, writing up some Python code to push his captured data into the service was trivial. So far, so good.

But like so many horror films that begin with a happy family starting a new life in a beautiful home, there was a monster lurking in the shadows. It’s one thing to capture data from perfectly clean and flat receipts, but quite another to get any useful info out of one that spent half the day crumpled up in your back pocket. The promising proof of concept that worked a treat under controlled conditions failed completely in the real-world, with [Marcel] reporting that only 1 in 5 receipts he tried to scan actually went through.

In the end, [Marcel] realized that the best way to handle the unreliable condition of the receipts was to focus on a different object in the image. He came up with a QR code marker that he could put on the table with the receipt to be scanned, which his software can use as a known point of reference. This greatly improves the reliability of the image rotation and transformation, which in turn makes the OCR more reliable. It also makes it much easier to tell which images need to be scanned — if there’s no QR code found, the software just skips that shot and keeps looking.

The unique challenges of digitizing large amounts of printed content using OCR makes for some fascinating problem solving, and we’re glad [Marcel] shared this particular story with us. While there’s still some edge cases that need chasing down, he’s using the software on a nearly daily basis, and has posted it up on GitHub for anyone who might wish to build on his efforts.