[Geyes30]’s Raspberry Pi project does one thing: it finds arbitrary text in the camera’s view and reads it out loud. Does it do so flawlessly? Not really. Was it at least effortless to put together? Also no, but it does wonderfully illustrate the process of gluing together different bits of functionality to make something new. Also, [geyes30]’s kids find it fascinating, and that’s a win all on its own.
The device is made from a Raspberry Pi and camera and works by sending a still image from the camera to an optical character recognition (OCR) program, which converts any visible text in the image to its ASCII representation. The recognized text is then piped to the espeak engine and spoken aloud. Getting all the tools to play nicely took a bit of work, but [geyes30] documented everything so well that even a novice should be able to get the project up and running in an afternoon.
Sometimes a function like text-to-speech is an end result in and of itself. This was also true of another similar project: Magic Mirror, whose purpose was to tirelessly indulge children’s curiosity about language.
Seeing other projects come to life and learning about new tools is a great way to get new ideas, and documenting them helps cross-pollinate among creative types. Did something inspire you recently, or have you documented your own project? We want to hear about it and so do others, so let us know via the tips line!
Could be done with an Amiga 30 years ago :p
Well, kind of. speech.device was pretty neat but you’d struggle to do the kind of image capture and text capture, though it would be technically feasible with a scanner or video digitiser
No it couldn’t.
OK, you needed to throw in a 555
I remember having my image captured by a digital camera about 30 years ago I think the resolution was 256 pixels by 256 pixels with 64 level greyscale (6-bits). I can not remember what interface was used to transfer the image to the computer, it must have been either RS-232 or a bi-directional parallel port.
Scsi card?
I could have been. But I do not think so, because I remember it took at least a minute to transfer the data from the SRAM in the camera to the computer. And I remember joking that they should add a floppy drive to it, that would be faster.
Nice work. For one of my past projects (https://hackaday.io/project/28445-usb-cdc-robosapien-v1/log/181633-text-to-speech) I used pico2wave instead of espeak. If I remember well pico2wave produced much more natural voice.
Could be useful for vision impaired
in Emacspeak a mode in emacs has an interface for using ocr and saying it out load.
Would also be fun tied into Google translate.
Wasn’t this done witha speak and spell a long time ago?
If you want the best voice (by far) then use Larynx which you can find on github.