Raspberry Pi Reads What It Sees, Delights Children

[Geyes30]’s Raspberry Pi project does one thing: it finds arbitrary text in the camera’s view and reads it out loud. Does it do so flawlessly? Not really. Was it at least effortless to put together? Also no, but it does wonderfully illustrate the process of gluing together different bits of functionality to make something new. Also, [geyes30]’s kids find it fascinating, and that’s a win all on its own.

The device is made from a Raspberry Pi and camera and works by sending a still image from the camera to an optical character recognition (OCR) program, which converts any visible text in the image to its ASCII representation. The recognized text is then piped to the espeak engine and spoken aloud. Getting all the tools to play nicely took a bit of work, but [geyes30] documented everything so well that even a novice should be able to get the project up and running in an afternoon.

Sometimes a function like text-to-speech is an end result in and of itself. This was also true of another similar project: Magic Mirror, whose purpose was to tirelessly indulge children’s curiosity about language.

Seeing other projects come to life and learning about new tools is a great way to get new ideas, and documenting them helps cross-pollinate among creative types. Did something inspire you recently, or have you documented your own project? We want to hear about it and so do others, so let us know via the tips line!

Continue reading “Raspberry Pi Reads What It Sees, Delights Children”

British Licence Plate Camera Fooled By Clothing

It’s a story that has caused consternation and mirth in equal measure amongst Brits, that the owners of a car in Surrey received a fine for driving in a bus lane miles away in Bath, when in fact the camera had been confused by the text on a sweater worn by a pedestrian. It seems the word “knitter” had been interpreted by the reader as “KN19 TER”, which as Brits will tell you follows the standard format for modern UK licence plate.

It gives us all a chance to have a good old laugh at the expense of the UK traffic authorities, but it raises some worthwhile points about the fallacy of relying on automatic cameras to dish out fines without human intervention. Except for the very oldest of cars, the British number plate follows an extremely distinctive high-contrast format of large black letters on a reflective white or yellow background, and since 2001 they have all had to use the same slightly authoritarian-named MANDATORY typeface. They are hardly the most challenging prospect for a number plate recognition system, but even when it makes mistakes the fact that ambiguous results aren’t subjected to a human checking stage before a fine is sent out seems rather chilling.

It also raise the prospect of yet more number-plate-related mischief, aside from SQL injection jokes and adversarial fashion, we can only imagine the havoc that could be caused were a protest group to launch a denial of service attack with activists sporting fake MANDATORY licence plates.

Header image, based on the work of ZElsb, CC BY-SA 4.0.

TMD-2: A Bigger, Better, More Collaborative Turing Machine

One of the things we love best about the articles we publish on Hackaday is the dynamic that can develop between the hacker and the readers. At its best, the comment section of an article can be a model of collaborative effort, with readers’ ideas and suggestions making their way into version 2.0 of a build.

This collegial dynamic is very much on display with TMD-2, [Michael Gardi]’s latest iteration of his Turing machine demonstrator. We covered the original TMD-1 back in late summer, the idea of which was to serve as a physical embodiment of the Turing machine concept. Briefly, the TMD-1 represented the key “tape and head” concepts of the Turing machine with a console of servo-controlled flip tiles, the state of which was controlled by a three-state, three-symbol finite state machine.

TMD-1

TMD-1 was capable of simple programs that really demonstrated the principles of Turing machines, and it really seemed to catch on with readers. Based on the comments of one reader, [Newspaperman5], [Mike] started thinking bigger and better for TMD-2. He expanded the finite state machine to six states and six symbols, which meant coming up with something more scalable than the Hall-effect sensors and magnetic tiles of TMD-1.

TMD-2 has a camera for computer vision of the state machine tiles

[Mike] opted for optical character recognition using a Raspberry Pi cam along with Open CV and the Tesseract OCR engine. The original servo-driven tape didn’t scale well either, so that was replaced by a virtual tape displayed on a 7″ LCD display. The best part of the original, the tile-based FSM, was expanded but kept that tactile programming experience.

Hats off to [Mike] for tackling a project with so many technologies that were previously new to him, and for pulling off another great build. And kudos to [Newspaperman5] for the great suggestions that spurred him on.

OCR Reads Old Newspapers So We Don’t Have To

Plenty of people don’t bother to read the current newspaper, let alone editions that were published over 100 years ago. But there’s a wealth of important historical information buried in these dusty old publications, assuming you can find a way to reliably digitize and index it all. You might think the solution is as simple as running images of the paper through optical character recognition (OCR) software, but as [John Scancella] explains, the problem is a bit more complicated than that.

Stretching the text vertically highlights the columns.

Ultimately, the issue largely comes down to formatting. The OCR software reasonably assumes all the text is in orderly horizontal lines, because in the vast majority of cases, it would be. That’s how you’re reading these words now. But as anyone who’s seen an old time newspaper knows, that’s not how things were necessarily written back then. Pages consisted of multiple narrow columns of stories separated by vertical lines; if the OCR tries to read the page from left to right, the resulting text is a mishmash of several unrelated topics.

The answer is to break all those articles into their own images, but doing that manually at any sort of scale simply isn’t an option. So [John] has been working on a system that uses OpenCV to identify the columns of text and isolate them. He details the multi-step process down in his write-up, and even provides the Python code should you want to give it a spin. But the short version is that the image is converted to grayscale and the OpenCV dilate function is used to stretch the text in the Y dimension. This produces big blobs of white that can easily be picked out with findContours() and snipped into individual images.

It’s not a perfect solution, and there are still a few pitfalls. For one, the name of the paper needs to be removed from the front page before the stretching operation happens. But it’s clearly a step in the right direction, and the results certainly look very promising. Anything that makes OCR more accurate or easier to implement is a win in our book, so we’re excited to see where [John] takes this concept.

EasyOCR Makes OCR, Well, Easy

Working on embedded systems used to be easier. You had a microcontroller and maybe a few pieces of analog or digital I/O, and perhaps communications might be a serial port. Today, you have systems with networks and cameras and a host of I/O. Cameras are strange because sometimes you just want an image and sometimes you want to understand the image in some way. If understanding the image involves reading text in the picture, you will want to check out EasyOCR.

The Python library leverages other open source libraries and supports 42 different languages. As the name implies, using it is pretty easy. Here’s the setup:


import easyocr
reader = easyocr.Reader(['th','en'])
reader.readtext('test.jpg')

The results include four points that define the bounding box of each piece of text, the text, and a confidence level. The code takes advantage of the GPU, but you can run it in a CPU-only mode if you prefer. There are a few other options, including setting the algorithm’s scanning behavior, how it handles multiple processors, and how it converts the image to grayscale. The results look impressive.

According to the project’s repository, they incorporated several existing neural network algorithms and conventional algorithms, so if you want to dig into details, there are links provided to both code and white papers. If you need some inspiration for what to do with OCR, maybe this past project will give you some ideas. Or you could cheat at games.

Hackaday Podcast 035: LED Cubes Taking Over, Ada Vanquishes C Bugs, Rad Monitoring Is Hot, And 3D Printing Goes Full 3D

Hackaday Editors Mike Szczys and Elliot Williams get caught up on the most interesting hacks of the past week. On this episode we take a deep dive into radiation-monitor projects, both Geiger tube and scintillator based, as well as LED cube projects that pack pixels onto six PCBs with parts counts reaching into the tens of thousands. In the 3D printing world we want non-planar printing to be the next big thing. Padauk microcontrollers are small, cheap, and do things in really interesting ways if you don’t mind embracing the ecosystem. And what’s the best way to read a water meter with a microcontroller?

Take a look at the links below if you want to follow along, and as always tell us what you think about this episode in the comments!

Take a look at the links below if you want to follow along, and as always, tell us what you think about this episode in the comments!

Direct download (60 MB or so.)

Continue reading “Hackaday Podcast 035: LED Cubes Taking Over, Ada Vanquishes C Bugs, Rad Monitoring Is Hot, And 3D Printing Goes Full 3D”

Rock Out To The Written Word With BookSound

With his latest project, [Roni Bandini] has simultaneously given the world a new type of audiobook and music. Traditional audiobooks are basically the adult equivalent of having somebody read you a bedtime story, but BookSound actually turns the written word into electronic music. You won’t be able to boast to your friends that as a matter of fact, you have read that popular new novel, but at least you might be able to dance to it.

[Roni] says he’s still working on perfecting the word to music mapping, so the results shown in the video after the break are still a bit rough. But even in these early stages there’s no denying this is an exceptionally unique project, and we’re excited to see where it goes from here.

Inside the classy looking 3D printed enclosure is a Raspberry Pi, an OLED display, and the button and switch which make up the extent of the device’s controls. At the end of the arm is a standard Raspberry Pi Camera module, which gives the BookSound a bird’s eye view of the book to be songified.

To turn your favorite book into electronic beats, simply open it up, put it under the gaze of BookSound, and press the button on the front. Because the Raspberry Pi isn’t exactly a powerhouse, it takes about two minutes for it to scan the page, perform optical character recognition (OCR), and compose the track before you start to hear anything.

If you’re wondering what the secret sauce is to turn words into music, [Roni] isn’t ready to share his source code just yet. But he was able to give us a few high-level explanations of what’s going on inside BookSound. For example, to generate the song’s BPM, the software will count how many words per paragraph are on the page: so a book with shorter paragraphs will consequently have a faster tempo to match the speed at which the author is moving through ideas. Similarly, drum kicks are generated based on the number of syllables in each paragraph. In the future, he’s looking at adding “lyrics” by running commonly used words on the page through a text to speech engine and inserting them into the beat.

We’ve seen practical applications of OCR on the Raspberry Pi in the past and even similar looking book scanning arrangements. But nothing quite like BookSound before, which at this point, is really saying something.

Continue reading “Rock Out To The Written Word With BookSound”