Rescuing The Data On A 1960s LGP-21 Computer’s Disk Memory

One of the nice things about magnetic storage is that as long as the magnetic layer remains intact, the data it contains should stay readable pretty much indefinitely. That raises the prospect of recovering data from really old computer systems featuring magnetic memory, such as the 63-year old LGP-21 that [David Lovett] of Usagi Electric is currently restoring. Its magnetic memory disk is nothing amazing by modern standards, but after initial testing it seems to spin up and read data just fine, raising the question of what was left on the drive when it was last used, meaning what was in memory at the time.

The read/write head side of the LGP-21's magnetic memory. (Credit: Usagi Electric, YouTube)
The read/write head side of the LGP-21’s magnetic memory. (Credit: Usagi Electric, YouTube)

Non-invasive data recovery here involves writing a program that will simply read the entire disk from beginning to end. Tracks 0 and 1 were found to be unreadable due to some kind of hardware issue, but track 2 could be backed up by looking at the output on the CRT, thus providing a track to use. Fascinatingly the LGP-21’s memory disks uses interleaved tracks to reduce the number of read/write heads as part of the overall cost-saving measures relative to the more expensive LGP-30. As you might expect, this slows down memory access a lot over its big brother.

Before any recovery attempt could begin, the Flexowriter typewriter that forms the user interface to the computer had to be given some serious maintenance, along with a few other components like a switch and the paper tape reader. This restored the ability to even properly enter data and receive output instructions.

The subsequent effort to recover the stored data involved a bootstrap program that got loaded into memory, after which the remainder of the program was loaded from paper tape. Following this everything worked swimmingly, though with the caveat that with not even a floppy drive to use, the raw hexadecimal data was hammered out on paper with the Flexowriter over the course of 1.5 hours.

This data will now be scanned in and OCR-ed into something that can hopefully be easily analyzed. Hopefully we’ll know before long what this system was last used for.

Continue reading “Rescuing The Data On A 1960s LGP-21 Computer’s Disk Memory”

Desktop Digitizer Makes Note Capture A Breeze

While it might seem quaint these days, we’ve met many makers and hackers who reach for a pen and a pad when learning something new or working their way through some technical problem. But even if you’re the type of person who thinks best when writing something out on paper, there’s still a good chance that you’ll eventually want to bring those notes and sketches into the digital realm. That’s where things can get a little tricky.

[Spencer Adams-Rand] recently wrote in with his clever solution for capturing written notes and pushing them into Notion, but the hardware design and digitization workflow is flexible enough that it could be adapted to your specific needs — especially since he was good enough to release all the files required to build your own version.

Whether they are hand-written notes, old photographs, or legal documents, digitization boils down to taking a high resolution digital photo of the object and running it through the appropriate software. But getting good and consistent photos is the key, especially when you’re working your way through a lot of pages. [Spencer] started out just snapping pictures with his phone, but quickly found the process was less than ideal.

His custom scanning station addresses that first part of the problem: getting consistent shots. The images are captured using a Raspberry Pi 5 with attached Camera Module 3, while the 3D printed structure of the device makes sure that the camera and integrated lighting system are always in the same position. All he needs to do is place his notepad inside the cavity, hit the button, and it produces a perfect shot of the page.

Using a dedicated digitizing station like this would already provide better results than trying to freehand it with your phone or camera, but [Spencer] took things quite a bit farther. The software side of the project puts a handy user interface on the 5 inch touch screen built into the top of the scanner, while also providing niceties like a REST API and integration with the OpenAI Vision API for optical character recognition (OCR).

Those with an aversion to AI could certainly swap this out for something open source like Tesseract, but [Spencer] notes that not only is OpenAI’s OCR better at reading his handwriting, it spits out structured markdown-like data that’s easier to parse. From there it goes into the Notion API, but again, this could be replaced with whatever you use to collect your digital thoughts.

A device like this would go a long way towards answering a question we posed to the community back in January about the best way to digitize your documents.

Ask Hackaday: How Do You Digitize Your Documents?

Like many of you, I have a hard time getting rid of stuff. I’ve got boxes and boxes of weirdo bits and bobs, and piles of devices that I’ll eventually get around to stripping down into even more bits and bobs. Despite regular purges — I try to bring a car-load of crap treasure to local hackerspaces and meetups at least a couple times a year — the pile only continues to grow.

But the problem isn’t limited to hardware components. There’s all sorts of things that the logical part of me understands I’ll almost certainly never need, and yet I can’t bring myself to dispose of. One of those things just so happens to be documents. Anything printed is fair game. Could be the notes from my last appointment with the doctor, or fliers for events I attended years ago. Doesn’t matter, the stacks keep building up until I end up cramming it all into a box and start the whole process starts over again.

I’ve largely convinced myself that the perennial accumulation of electronic bric-à-brac is an occupational hazard, and have come to terms with it. But I think there’s a good chance of moving the needle on the document situation, and if that involves a bit of high-tech overengineering, even better. As such, I’ve spent the last couple of weeks investigating digitizing the documents that have information worth retaining so that the originals can be sent along to Valhalla in my fire pit.

The following represents some of my observations thus far, in the hopes that others going down a similar path may find them useful. But what I’m really interested in is hearing from the Hackaday community. Surely I’m not the only one trying to save some storage space by turn piles of papers into ones and zeros.

Continue reading “Ask Hackaday: How Do You Digitize Your Documents?”

Retrotechtacular: IBM’s The World Of OCR

Optical Character Recognition (OCR) forms the bridge between the analog world of paper and the world of machines. The modern-day expectation is that when we point a smartphone camera at some characters it will flawlessly recognize and read them, but OCR technology predates such consumer technology by a considerable amount, with IBM producing OCR systems as early as the 1950s. In a 1960s promotional video on the always delightful Periscope Film channel on YouTube we can get an idea of how this worked back then, in particular the challenge of variable quality input.

What drove OCR was the need to process more paper-based data faster, as the amount of such data increased and computers got more capable. This led to the design of paper forms that made the recognition much easier, as can still be seen today on for example tax forms and on archaic paper payment methods like checks in countries that still use it. This means a paper form optimized for reflectivity, with clearly designated sections and lines, thus limiting the variability of the input forms to be OCR-ed. After that it’s just a matter of writing with clear block letters into the marked boxes, or using a typewriter with a nice fresh ink ribbon.

These days optical scanners are a lot more capable, of course, making many of such considerations no longer as relevant, even if human handwriting remains a challenge for OCR and human brains alike.

Continue reading “Retrotechtacular: IBM’s The World Of OCR”

“Glasses” That Transcribe Text To Audio

Glasses for the blind might sound like an odd idea, given the traditional purpose of glasses and the issue of vision impairment. However, eighth-grade student [Akhil Nagori] built these glasses with an alternate purpose in mind. They’re not really for seeing. Instead, they’re outfitted with hardware to capture text and read it aloud.

Yes, we’re talking about real-time text-to-audio transcription, built into a head-worn format. The hardware is pretty straightforward: a Raspberry Pi Zero 2W runs off a battery and is outfitted with the usual first-party camera. The camera is mounted on a set of eyeglass frames so that it points at whatever the wearer might be “looking” at. At the push of a button, the camera captures an image, and then passes it to an API which does the optical character recognition. The text can then be passed to a speech synthesizer so it can be read aloud to the wearer.

It’s funny to think about how advanced this project really is. Jump back to the dawn of the microcomputer era, and such a device would have been a total flight of fancy—something a researcher might make a PhD and career out of. Indeed, OCR and speech synthesis alone were challenge enough. Today, you can stand on the shoulders of giants and include such mighty capability in a homebrewed device that cost less than $50 to assemble. It’s a neat project, too, and one that we’re sure taught [Akhil] many valuable skills along the way.

Continue reading ““Glasses” That Transcribe Text To Audio”

Scanning Receipts Proves Trickier Than Anticipated

It’s one of those things that certainly sounds simple enough: take a picture of a receipt, run it through optical character recognition (OCR), and send the resulting information to whatever expense-tracking website or software you wish. There are companies that offer such a service, so it can’t be too difficult to replicate on your own…right?

That’s what [Marcel Robitaille] thought when he set out to create his homebrew “Receipt Ingestion” system, anyway. But in reality it took so much time to troubleshoot and implement that he says it would have been faster to just enter in all his receipts by hand. We’re happy he stuck with it though, otherwise you wouldn’t be reading about it on Hackaday, and we wouldn’t be able to learn anything from the detailed account he’s provided.

It only took an evening to hack together a rough demo, and the initial results were very promising. The code could detect the edges of the receipt, rotate the captured image appropriately, and then pull out the critical information such as date, total amount, business name, etc. He was then able to decipher the API for Splitwise, an online service for splitting bills, by capturing the data sent by his browser while adding a new bill. With this information, writing up some Python code to push his captured data into the service was trivial. So far, so good.

Using a QR code as reference point.

But like so many horror films that begin with a happy family starting a new life in a beautiful home, there was a monster lurking in the shadows. It’s one thing to capture data from perfectly clean and flat receipts, but quite another to get any useful info out of one that spent half the day crumpled up in your back pocket. The promising proof of concept that worked a treat under controlled conditions failed completely in the real-world, with [Marcel] reporting that only 1 in 5 receipts he tried to scan actually went through.

In the end, [Marcel] realized that the best way to handle the unreliable condition of the receipts was to focus on a different object in the image. He came up with a QR code marker that he could put on the table with the receipt to be scanned, which his software can use as a known point of reference. This greatly improves the reliability of the image rotation and transformation, which in turn makes the OCR more reliable. It also makes it much easier to tell which images need to be scanned — if there’s no QR code found, the software just skips that shot and keeps looking.

The unique challenges of digitizing large amounts of printed content using OCR makes for some fascinating problem solving, and we’re glad [Marcel] shared this particular story with us. While there’s still some edge cases that need chasing down, he’s using the software on a nearly daily basis, and has posted it up on GitHub for anyone who might wish to build on his efforts.

OCR Reads Old Newspapers So We Don’t Have To

Plenty of people don’t bother to read the current newspaper, let alone editions that were published over 100 years ago. But there’s a wealth of important historical information buried in these dusty old publications, assuming you can find a way to reliably digitize and index it all. You might think the solution is as simple as running images of the paper through optical character recognition (OCR) software, but as [John Scancella] explains, the problem is a bit more complicated than that.

Stretching the text vertically highlights the columns.

Ultimately, the issue largely comes down to formatting. The OCR software reasonably assumes all the text is in orderly horizontal lines, because in the vast majority of cases, it would be. That’s how you’re reading these words now. But as anyone who’s seen an old time newspaper knows, that’s not how things were necessarily written back then. Pages consisted of multiple narrow columns of stories separated by vertical lines; if the OCR tries to read the page from left to right, the resulting text is a mishmash of several unrelated topics.

The answer is to break all those articles into their own images, but doing that manually at any sort of scale simply isn’t an option. So [John] has been working on a system that uses OpenCV to identify the columns of text and isolate them. He details the multi-step process down in his write-up, and even provides the Python code should you want to give it a spin. But the short version is that the image is converted to grayscale and the OpenCV dilate function is used to stretch the text in the Y dimension. This produces big blobs of white that can easily be picked out with findContours() and snipped into individual images.

It’s not a perfect solution, and there are still a few pitfalls. For one, the name of the paper needs to be removed from the front page before the stretching operation happens. But it’s clearly a step in the right direction, and the results certainly look very promising. Anything that makes OCR more accurate or easier to implement is a win in our book, so we’re excited to see where [John] takes this concept.