Convert Any Book To A DIY Audiobook?

July 6, 2025

If the idea of reading a physical book sounds like hard work, [Nick Bild’s] latest project, the PageParrot, might be for you. While AI gets a lot of flak these days, one thing modern multimodal models do exceptionally well is image interpretation, and PageParrot demonstrates just how accessible that’s become.

[Nick] demonstrates quite clearly how little code is needed to get from those cryptic black and white glyphs to sounds the average human can understand, specifically a paltry 80 lines of Python. Admittedly, many of those lines are pulling in libraries, and some are just blank, so functionally speaking, it’s even shorter than that. Of course, the whole application is mostly glue code, stitching together other people’s hard work, but it’s still instructive and fun to play with.

The hardware required is a Raspberry Pi Zero 2 W, a camera (in this case, a USB webcam), and something to hold it above the book. Any Pi with the ability to connect to a camera should also work, however, with just a little configuration.

On the software side, [Nick] pulls in the CV2 library (which is the interface to OpenCV) to handle the camera interfacing, programming it to full HD resolution. Google’s GenAI is used to interface the Gemini 2.5 Flash LLM via an API endpoint. This takes a captured image and a trivial prompt, and returns the whole page of text, quick as a flash.

Finally, the script hands that text over to Piper, which turns that into a speech file in WAV format. This can then be played to an audio device with a call out to the console aplay tool. It’s all very simple at this level of abstraction.

Yes, we know it’s essentially just doing the same thing OCR software has been doing for decades. Still, the AI version is remarkably low-effort and surprisingly accurate, especially when handling unusual layouts that confound traditional OCR algorithms. Extensions to this tool would be trivial; for example, adjusting the prompt to ask it to translate the text to a different language could open up a whole new world to some people.

If you want to play along at home, then head on over to the PageParrot GitHub page and download the script.

If this setup feels familiar, you’d be quite correct. We covered something similar a couple of years back, which used Tesseract OCR, feeding text to Festvox’s CMU Flite tool. Whilst we’re talking about text-to-speech, here’s a fun ESP32-based software phoneme synthesiser to recreate that distinctive 1980s Speak & Spell voice.

12 thoughts on “Convert Any Book To A DIY Audiobook?”

. says:

July 6, 2025 at 6:21 am

there is a project ebook2audiobook that converts ebooks to audiobook with AI voice generated.
You can choose pretrained voice models or train your own for it. Emotions are also handled well with the available models.

Report comment

Reply
1. scott_tx says:
  
  July 6, 2025 at 1:41 pm
  
  Audiblez and several others also
  
  Report comment
  
  Reply
Gravis says:

July 6, 2025 at 6:37 am

An impressive product for the effort put into it. It would seem doing all those captchas has paid off. ;)

While, the generated voice is on par with a modern voice synthesizer, being AI based, I expected it to have a sense of cadence. Sadly, it’s limited to merely pronouncing words properly and in order, so I’m hoping the next generation of speech engines will focus on speaking in a more natural and flowing pattern while the one after that will be able to use intonation properly. My expectations for future voice synthesis models are high as I now expect them to have delivery on par with a parent reading a book to a child, doing varied voices for each character and all.

While it seems like a shame to rely on a huge (online only) LLM when he’s already using OpenCV for image capture, I think someone needs to develop a small neural network to identify page/text layout before you can have it read magazine articles without weirdly injecting descriptive text written below figures.

Report comment

Reply
Anonymous says:

July 6, 2025 at 8:50 am

How good is OCR these days? I’d like to make it easier to select text from some books that are sent as images/scans. Last I tried was awhile back, there were quite a few errors and you lost all formatting of text

Report comment

Reply
Ronnie says:

July 6, 2025 at 10:16 am

please note that you are handing Google et. al. illegal copies of legally protected works. I would not be surprised if this turned out to be pushed by them as a way to circumvent the problems with scanning the books themselves…

Report comment

Reply
1. Tom says:
  
  July 7, 2025 at 7:13 am
  
  I would just do this locally. Gemma3 is pretty good (4B for images).
  
  Report comment
  
  Reply
ian 42 says:

July 6, 2025 at 2:04 pm

“If the idea of reading a physical book sounds like hard work” then you need to do it much more to at least get up to the level of a 8yo.

I know some people have eyesight problems, and that the USA has a massive illiteracy problem, but apart from that why would you want a audio book that is quite a lot slower to listen to than reading the text in the first place?

Report comment

Reply
1. Zebazga says:
  
  July 6, 2025 at 6:35 pm
  
  Health issues or various physical issues making holding book for long difficult. Or that require limited screen time like various neurological issues do. My thoughts.
  
  Report comment
  
  Reply
2. duh says:
  
  July 6, 2025 at 6:37 pm
  
  multitasking
  
  Report comment
  
  Reply
3. sbrk says:
  
  July 6, 2025 at 7:13 pm
  
  Try reading a book while driving a car, or mowing the lawn, or …
  
  Report comment
  
  Reply
4. Gus Mueller says:
  
  July 6, 2025 at 7:22 pm
  
  because sometimes we are piloting a space craft in a turbulent alien atmosphere at the time
  
  Report comment
  
  Reply
thermionik says:

July 6, 2025 at 6:34 pm

This reminds me of the ‘page turner’ from the movie Real Genius. If you combine the page turner with the page parrot, then you would have an amazing and cool book reading robot.

Report comment

Reply

Hackaday

Convert Any Book To A DIY Audiobook?

12 thoughts on “Convert Any Book To A DIY Audiobook?”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Australia’s Space Program Finally Gets Off The Pad, But Only Barely

What Happens When Lightning Strikes A Plane?

Happy Birthday 6502

Two For The Price Of One: BornHack 2024 And 2025 Badges

Hands On: The Hacker Pager

Our Columns

Hackaday Podcast Episode 332: 5 Axes Are Better Than 3, Hacking Your Behavior, And The Man Who Made Models

This Week In Security: Perplexity V Cloudflare, GreedyBear, And HashiCorp

The 64-Degree Egg, And Other Delicious Variants

Jenny’s Daily Drivers: FreeDOS 1.4

Get Your Tickets For Supercon 2025 Now!

12 thoughts on “Convert Any Book To A DIY Audiobook?”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns