OCR Reads Old Newspapers So We Don’t Have To

Plenty of people don’t bother to read the current newspaper, let alone editions that were published over 100 years ago. But there’s a wealth of important historical information buried in these dusty old publications, assuming you can find a way to reliably digitize and index it all. You might think the solution is as simple as running images of the paper through optical character recognition (OCR) software, but as [John Scancella] explains, the problem is a bit more complicated than that.

Stretching the text vertically highlights the columns.

Ultimately, the issue largely comes down to formatting. The OCR software reasonably assumes all the text is in orderly horizontal lines, because in the vast majority of cases, it would be. That’s how you’re reading these words now. But as anyone who’s seen an old time newspaper knows, that’s not how things were necessarily written back then. Pages consisted of multiple narrow columns of stories separated by vertical lines; if the OCR tries to read the page from left to right, the resulting text is a mishmash of several unrelated topics.

The answer is to break all those articles into their own images, but doing that manually at any sort of scale simply isn’t an option. So [John] has been working on a system that uses OpenCV to identify the columns of text and isolate them. He details the multi-step process down in his write-up, and even provides the Python code should you want to give it a spin. But the short version is that the image is converted to grayscale and the OpenCV dilate function is used to stretch the text in the Y dimension. This produces big blobs of white that can easily be picked out with findContours() and snipped into individual images.

It’s not a perfect solution, and there are still a few pitfalls. For one, the name of the paper needs to be removed from the front page before the stretching operation happens. But it’s clearly a step in the right direction, and the results certainly look very promising. Anything that makes OCR more accurate or easier to implement is a win in our book, so we’re excited to see where [John] takes this concept.

Recreating Retrocomputers Hack Chat

Join us on Wednesday, August 12 at noon Pacific for the Recreating Retrocomputers Hack Chat with Mike Gardi!

Building the first commercial computers in the late 1950s and early 1960s was certainly a complex a task, but building the computer industry was even harder. Sure, engineers were already getting on board with designing in silicon and germanium instead of glass and tungsten, and all digital circuits are really just abstractions of analog designs most of them were already familiar with. But what about all the other people who would need to get up to speed on the workings of digital computers? What good is a tool if the only people who know how to use it art the ones who built it?

To make computers make money, companies needed legions of installers, operators, programmers, marketers, and salespeople, and all of them needed training. And so early computer companies put a lot of effort into building training devices to get people up to speed. These trainers helped teach everything from basic logic circuits and Boolean relationships to simple programming concepts, and each of them contributed in their own way to developing the computer industry that we know today.

Mike Gardi has a unique hobby: among other things, he builds faithful replicas of some of the nicer examples of these lost bits of computing history. His reproduction of Claude Shannon’s Minivac 601 trainer is a great example of the art, as is the DEC H-500 Computer Lab build he’s currently working on. Along the way, he’s explored some side alleys on the road to our computerized world, like Dr. Nim and the paperclip computer. All his builds are lovingly created from 3D-prints and really capture the essence of the toys and tools of the time.

Join us as we take a trip inside this niche realm of retrocomputing and find out why Mike finds it fascinating enough to devote the time it obviously takes to build such exacting replicas. We’ll talk about what projects he’s got going on right now, what he has planned for the future, and maybe even dive into some of his secrets for such great looking 3D prints.

join-hack-chatOur Hack Chats are live community events in the Hackaday.io Hack Chat group messaging. This week we’ll be sitting down on Wednesday, August 12 at 12:00 PM Pacific time. If time zones baffle you as much as us, we have a handy time zone converter.

Click that speech bubble to the right, and you’ll be taken directly to the Hack Chat group on Hackaday.io. You don’t have to wait until Wednesday; join whenever you want and you can see what the community is talking about.

Continue reading “Recreating Retrocomputers Hack Chat”

Jan Czochralski And The Silicon Revolution

If you were to travel back in time to the turn of the previous century and try to convince the average person that the grains of sand on just about any beach would be the basis of an industry worth hundreds of billions of dollars within 100 years, they’d probably have thought you were crazy. Aside from being coarse, rough, and irritating, sand is everywhere, and convincing anyone of its value would be a hard sell, unless your interlocutor was a real estate visionary with an appreciation of the future value of seaside property and a lot of patience.

Fast forward to our time, and we all know the value of the material that comes from common quartz sand: silicon, specifically the ultra-purified crystals of silicon that end up as the wafers we depend on to build the circuitry of life. The trip from beach to chip foundry is a long and non-obvious one which would not have been possible without the insights of an undistinguished Polish student and one-time druggist who discovered the process that made the Information Age possible: Jan Czochralski.

Continue reading “Jan Czochralski And The Silicon Revolution”

Ham Radio Mobile Operations Circa 1919

You used to be able to tell a die-hard ham radio operator on the road by the number and length of antennas protruding porcupine-like from their vehicle. There are still some mobile high frequency operators that have respectable car-mounted antenna farms, but they have nothing on Alfred H. Grebe. In 1919, he fitted a medium wave transmitter in his car that operated around 2 MHz. Since it needed a very large antenna, Grebe rigged a wire antenna that looked like a clothesline between the two bumpers. Obviously, you had to stop, set up your antenna, and then operate — you couldn’t talk and drive. But this may have been the world’s first automotive radio setup for voice communication.

The car had a separate battery for the radio and a dynamotor to generate high voltage for the tubes. Although many radio enthusiasts found ways to add receivers to their cars in the 1920s, it would be 1930 before Motorola made radios especially for cars in production quantities.

Continue reading “Ham Radio Mobile Operations Circa 1919”

Does PHP Have A Future, Or Are Twenty Five Years Enough?

In June, 1995, Rasmus Lerdorf made an announcement on a Usenet group. You can still read it.

Today, twenty five years on, PHP is about as ubiquitous as it could possibly have become. I’d be willing to bet that for the majority of readers of this article, their first forays into web programming involved PHP.

Announcing the Personal Home Page Tools (PHP Tools) version 1.0.

These tools are a set of small tight cgi binaries written in C.

But no matter what rich history and wide userbase PHP holds, that’s no justification for its use in a landscape that is rapidly evolving. Whilst PHP will inevitably be around for years to come in existing applications, does it have a future in new sites?

Continue reading “Does PHP Have A Future, Or Are Twenty Five Years Enough?”

How Can Heavy Metal Fly?

Scientists found a surprising amount of lead in a glacier. They were studying atmospheric pollution by sampling ice cores taken from Alpine glaciers. The surprising part is that they found more lead in strata from the late 13th century than they had in those deposited at the height of the Industrial Revolution. Surely mediaeval times were supposed to be more about knights in shining armour than dark satanic mills, what on earth was going on? Why was the lead industry in overdrive in an age when a wooden water wheel represented high technology?

The answer lies in the lead smelting methods used a thousand miles away from that glacier, and in the martyrdom of a mediaeval saint.

Continue reading “How Can Heavy Metal Fly?”

BINA-VIEW: A Fascinating Mechanical Interference Display

[Fran Blanche] tears down this fascinating display in a video teardown, embedded below.

These displays can support up to 64 characters of the buyer’s choosing which is controlled by 6 bits, surprisingly only requiring 128 mW per bit to control; pretty power-light for its day and age. Aside from alphanumeric combinations the display also supported “color plates” which we found quite fascinating. The fully decked model would only cost you $1,206 US dollars per unit in today’s money or five rolls of toilet paper at latest street price. And that’s just one digit.

If you dig through the documents linked here, and watch her video you can get an idea of how this display works. There are six solenoids attached to rods at the rear of the device. A lamp shines through a lens onto the back of a plate assembly. Each plate is a strategically perforated grid. When the solenoids activate the selected plates tilt interfering with a stationary grid. This causes the light to be blocked in some regions only.

It seems clear why this never took off. Aligning these seems like a production nightmare compared to things like flip displays and Nixie tubes. Still, the characters have quite a lot of charm to them. We wouldn’t mind seeing a 3D printable/laser cut version of this display type. Get working!

Continue reading “BINA-VIEW: A Fascinating Mechanical Interference Display”