Convert Any Book To A DIY Audiobook?

July 6, 2025 by Dave Rowntree 12 Comments

If the idea of reading a physical book sounds like hard work, [Nick Bild’s] latest project, the PageParrot, might be for you. While AI gets a lot of flak these days, one thing modern multimodal models do exceptionally well is image interpretation, and PageParrot demonstrates just how accessible that’s become.

[Nick] demonstrates quite clearly how little code is needed to get from those cryptic black and white glyphs to sounds the average human can understand, specifically a paltry 80 lines of Python. Admittedly, many of those lines are pulling in libraries, and some are just blank, so functionally speaking, it’s even shorter than that. Of course, the whole application is mostly glue code, stitching together other people’s hard work, but it’s still instructive and fun to play with.

The hardware required is a Raspberry Pi Zero 2 W, a camera (in this case, a USB webcam), and something to hold it above the book. Any Pi with the ability to connect to a camera should also work, however, with just a little configuration.

On the software side, [Nick] pulls in the CV2 library (which is the interface to OpenCV) to handle the camera interfacing, programming it to full HD resolution. Google’s GenAI is used to interface the Gemini 2.5 Flash LLM via an API endpoint. This takes a captured image and a trivial prompt, and returns the whole page of text, quick as a flash.

Finally, the script hands that text over to Piper, which turns that into a speech file in WAV format. This can then be played to an audio device with a call out to the console aplay tool. It’s all very simple at this level of abstraction.

Continue reading “Convert Any Book To A DIY Audiobook?” →

2025 Pet Hacks Contest: Fort Bawks Is Guarded By Object Detection

June 9, 2025 by Tyler August 5 Comments

One of the difficult things about raising chickens is that you aren’t the only thing that finds them tasty. Foxes, raccoons, hawks — if it can eat meat, it probably wants a bite of your flock. [donutsorelse] wanted to protect his flock and to be able to know when predators were about without staying up all night next to the hen-house. What to do but outsource the role of Chicken Guardian to a Raspberry pi?

Object detection is done using a YOLOv8 model trained on images of the various predators local to [donutorelse]. The model is running on a Raspberry Pi and getting images from a standard webcam. Since the webcam has no low-light capability, the system also has a motion-activated light that’s arguably goes a long way towards spooking predators away itself. To help with the spooking, a speaker module plays specific sound files for each detected predator — presumably different sounds might work better at scaring off different predators.

If that doesn’t work, the system phones home to activate a siren inside [donutorelse]’s house, using a Blues Wireless Notecarrier F as a cellular USB modem. The siren is just a dumb unit; activation is handled via a TP-Link smart plug that’s hooked into [donutorelse]’s custom smart home setup. Presumably the siren cues [donutorelse] to take action against the predator assault on the chickens.

Weirdly enough, this isn’t the first time we’ve seen an AI-enabled chicken coop, but it is the first one to make into our ongoing challenge, which incidentally wraps up today.

DIY 3D Hand Controller Using A Webcam And Scripting

October 25, 2024 by Heidi Ulrich 3 Comments

Are you ready to elevate your interactive possibilities without breaking the bank? If so, explore [Caio Bassetti]’s tutorial on creating a full 3D hand controller using only a webcam, MediaPipe Hands, and Three.js. This hack lets you transform a 2D screen into a fully interactive 3D scene—all with your hand movements. If you’re passionate about low-cost, accessible tech, try this yourself – not much else is needed but a webcam and a browser!

The magic of the project lies in using MediaPipe Hands to track key points on your hand, such as the middle finger and wrist, to calculate depth and positioning. Using clever Three.js tricks, the elements can be controlled on a 3D axis. This setup creates a responsive virtual controller, interpreting hand gestures for intuitive movement in the 3D space. The hack also implements a closed-fist gesture to grab and drag objects and detects collisions to add interactivity. It’s a simple, practical build and it performs reliably in most browsers.

For more on this innovation or other exciting DIY hand-tracking projects, browse our archive on gesture control projects, or check out the full article on Codrops. With tools such as MediaPipe and Three.js, turning ideas into reality gets more accessible than ever.

Pulling Apart A Premium WebCam

August 21, 2024 by Al Williams 29 Comments

Over at EDN, [Brian Dipert] has been tearing down web cameras. A few months ago, he broke into a bargain basement camera. This time, he’s looking into a premium unit. Although we have to admit from some of what he reports, we are a little surprised at some of the corners cut. For example, it’s a 4K camera that doesn’t quite provide a 4K image. Despite a Sony CMOS sensor, [Brian] found the low-light performance to be poor. However, it does carry a much larger price tag than the previous camera examined.

The interesting part is about half way down the page when he tries to open the unit up. It seems like it is getting harder and harder to get into things and this camera was no exception. The device finally gives up. Inside is a relatively unremarkable board with a host of unknown ICs. One interesting item is a gyro chip that determines if the camera is upside down.

[Brian] managed to get the camera back together with no harm. It is interesting to compare it to the $15 camera he took apart earlier.

If you want maximum cred, do your video calls with a Game Boy camera. Or, at least, add your own lens to a webcam.

3D Printer Streaming Solution Unlocks Webcam Features

April 20, 2024 by Bryan Cockfield 7 Comments

While 3D printer hardware has come along way in the past decade and a half, the real development has been in the software. Open source slicers are constantly improving, and OctoPrint can turn even the most basic of printers into a network-connected powerhouse. But despite all these improvements, there’s still certain combinations of hardware that require a bit of manual work.

[Reticulated] wanted an easy way to monitor his prints over streaming video, but didn’t have any of the cameras that are supported by OctoPrint. Of course he could just point a cheap network-connected camera at the printer and be done with it, but he was looking for a bit better integration than that. In the process, he demonstrates how to unlock some features hidden in inexpensive webcams.

He set about building something that wouldn’t require buying more equipment or overloading the limited hardware responsible for the actual printing. A few of his existing cameras have RTMP support, which allows a fairly straightforward setup with YouTube Live once Monaserver is set up to handle the RTMP feeds from the cameras and OBS Studio is configured to stream it out to YouTube. Using the OctoPrint API, he was able to pull data such as the current extruder temperature and overlay it on the video.

One of the other interesting parts of this build is that not all of [Reticulated]’s cameras have built-in RTMP support but following this guide he was able to get more of them working with this setup than otherwise would have had this capability by default. Even beyond 3D printing, this is an excellent guide (and tip) for getting a quick live stream going for whatever reason. For anything more mobile than a working 3D printer, though, you might want to look at taking your streaming setup mobile instead.

D-POINT: A Digital Pen With Optical-Inertial Tracking

November 14, 2023 by Dave Rowntree 4 Comments

[Jcparkyn] clearly had an interesting topic for their thesis project, and was conscientious enough to write up a chunk of it and release it to the wild. The project in question is a digital pen that uses some neat sensor fusion to combine the inputs from a pen-mounted gyro/accelerometer with data from an optical tracking system provided by an off-the-shelf webcam.

A six degrees of freedom (6DOF) tracking system is achieved as a result, with the pen-mounted hardware tracking orientation and the webcam tracking the 3D position. The pen itself is quite neat, with an ALPS/Alpine HSFPAR003A load sensor measuring the contact pressure transmitted to it from the stylus tip. A Seeed Xaio nRF52840 sense is on duty for Bluetooth and hosting the needed IMU. This handy little module deals with all the details needed for such a high-integration project and even manages the charging of a single 10440 lithium cell via a USB-C connector.

Positional tracking uses Visual Pose Estimation (VPE) assisted with ArUco markers mounted on the end of the stylus. A consumer-grade (i.e. uncalibrated) webcam is all that is required on the hardware side. The software utilizes the familiar OpenCV stack to unroll the effects of the webcam rolling shutter, followed by Perspective-n-Point (PnP) to estimate the pose from the corrected image stream. Finally, a coordinate space conversion is performed to determine the stylus tip position relative to the drawing surface.

The sensor fusion is taken care of with a Kalman filter, smoothed with the typical Rauch-Tung-Striebel (RTS) algorithm before being passed onto the final application. This process is running in Python using the NumPy module, as you would expect, but accelerated using the Numba JIT compiler.

Motion tracking is not news to us, we’ve seen many an implementation over the years, such as this one. But digital input pens? Why aren’t they more of a thing?

Thanks to [Oliver] for the tip!

Full Self-Driving, On A Budget

October 17, 2023 by Bryan Cockfield 35 Comments

Self-driving is currently the Holy Grail in the automotive world, with a number of companies racing to build general-purpose autonomous vehicles that can get from point A to point B with no user input. While no one has brought one to market yet, at least one has promised this feature and had customers pay for it, but continually moved the goalposts for delivery due to how challenging this problem turns out to be. But it doesn’t need to be that hard or expensive to solve, at least in some situations.

The situation in question is driving on a single stretch of highway, and only focuses on steering, so it doesn’t handle the accelerator or brake pedal input. The highway is driven normally, using a webcam to take images of the route and an Arduino to capture data about the steering angle. The idea here is that with enough training the Arduino could eventually steer the car. But first some math needs to happen on the training data since the steering wheel is almost always not turning the car, so the Arduino knows that actual steering events aren’t just statistical anomalies. After the training, the system does a surprisingly good job at “driving” based on this data, and does it on a budget not much larger than laptop, microcontroller, and webcam.

Admittedly, this project was a proof-of-concept to investigate machine learning, neural networks, and other statistical algorithms used in these sorts of systems, and doesn’t actually drive any cars on any roadways. Even the creator says he wouldn’t trust it himself, but that he was pleasantly surprised by the results of such a simple system. It could also be expanded out to handle brake and accelerator pedals with separate neural networks as well. It’s not our first budget-friendly self-driving system, either. This one makes it happen with the enormous computing resources of a single Android smartphone.

Continue reading “Full Self-Driving, On A Budget” →