Archiving Data On Paper Using 2D Images

It seems like only yesterday we covered a project using QR codes to archive data on paper (OK, it was last Thursday), so here’s another way to do it, this time with a dedicated codec using the full page. Optar or OPTical ARchiver is a project capable of squeezing a whopping 200 Kb of data onto a single A4 sheet of paper, with writing and reading achieved with a standard laser printer and a scanner. It’s a bit harder than you might think to get that much data on the page, given that even a 600 DPI printer can’t reliably place every dot each time. Additionally, paper is rarely uniform at the microscopic scale, so Optar utilizes a forward error-correcting coding scheme to cater for a little irregularity in both printing and scanning.

The error-correcting scheme selected was an Extended Golay code (24, 12, 8),  which, interestingly, was also used for image transmission by the NASA Voyager 1 and 2 missions. In information theory terms, this scheme has a minimum Hamming Distance of 8, giving detection of up to seven bit errors. This Golay code implementation is capable of correcting three-bit errors in each 24-bit block, with 12 bits available for payload. That’s what the numbers in those brackets mean.

Another interesting problem is paper stretch during printing. A laser printer works by feeding the paper around rollers, some of which are heated. As a printer wears or gets dirty, the friction coefficient along the rollers can vary, leading to twisting and stretching of the paper during the printing process. Water absorbed by the paper can also lead to distortion. To compensate for these effects, Optar regularly inserts calibration targets throughout the bit image, which are used to locally resynchronize the decoding process as the image is processed. This is roughly similar to how the alignment patterns work within larger QR codes. Finally, similar to the position detection targets (those square bits) in QR codes, Optar uses a two-pixel-wide border around the bit image. The border is used to align to the corners well enough to locate the rows of bits to be decoded.

In the distant past of last week, we covered a similar project that uses QR codes. This got us thinking about how QR codes work, and even if encoding capacity can be increased using more colors than just black and white?

Thanks to [Petr] for the tip!

Wolverine Gives Your Python Scripts The Ability To Self-Heal

[BioBootloader] combined Python and a hefty dose of of AI for a fascinating proof of concept: self-healing Python scripts. He shows things working in a video, embedded below the break, but we’ll also describe what happens right here.

The demo Python script is a simple calculator that works from the command line, and [BioBootloader] introduces a few bugs to it. He misspells a variable used as a return value, and deletes the subtract_numbers(a, b) function entirely. Running this script by itself simply crashes, but using Wolverine on it has a very different outcome.

In a short time, error messages are analyzed, changes proposed, those same changes applied, and the script re-run.

Wolverine is a wrapper that runs the buggy script, captures any error messages, then sends those errors to GPT-4 to ask it what it thinks went wrong with the code. In the demo, GPT-4 correctly identifies the two bugs (even though only one of them directly led to the crash) but that’s not all! Wolverine actually applies the proposed changes to the buggy script, and re-runs it. This time around there is still an error… because GPT-4’s previous changes included an out of scope return statement. No problem, because Wolverine once again consults with GPT-4, creates and formats a change, applies it, and re-runs the modified script. This time the script runs successfully and Wolverine’s work is done.

LLMs (Large Language Models) like GPT-4 are “programmed” in natural language, and these instructions are referred to as prompts. A large chunk of what Wolverine does is thanks to a carefully-written prompt, and you can read it here to gain some insight into the process. Don’t forget to watch the video demonstration just below if you want to see it all in action.

While AI coding capabilities definitely have their limitations, some of the questions it raises are becoming more urgent. Heck, consider that GPT-4 is barely even four weeks old at this writing.

Continue reading “Wolverine Gives Your Python Scripts The Ability To Self-Heal”

ADSL Robustness Verified By Running Over Wet String

A core part of the hacker mentality is the desire to test limits: trying out ideas to see if something interesting, informative, and/or entertaining comes out of it. Some employees of Andrews & Arnold (a UK network provider) applied this mentality towards connecting their ADSL test equipment to some unlikely materials. The verdict of experiment: yes, ADSL works over wet string.

ADSL itself is something of an ingenious hack, carrying data over decades-old telephone wires designed only for voice. ADSL accomplished this in part through robust error correction measures keeping the bytes flowing through lines that were not originally designed for ADSL frequencies. The flow of bytes may slow over bad lines, but they will keep moving.

How bad? In this case, a pair of strings dampened with salty water. But there are limits: the same type of string dampened with just plain water was not enough to carry ADSL.

The pictures of the test setup also spoke volumes. They ran the wet string across a space that looked much like every hacker workspace, salt water dripping on the industrial carpet. Experimenting and learning right where you are, using what you have on hand, are hallmarks of hacker resourcefulness. Fancy laboratory not required.

Thanks to [chris] and [Spencer] for the tips.

Error Detection And Correction: Reed-Solomon, Convolution And Trellis Diagrams

Transferring data without error when there is a lot of background noise is a challenge. Dr. Claude E. Shannon in 1948 provided guidance with his theory addressing the capacity of a communications channel in the presence of noise. His work quickly spread beyond communications into other fields. Even other aspects of computer use were impacted. An example is the transfer of data from a storage medium, like a hard drive or CD-ROM. These media and their sensors are not 100% reliable so errors occur. Just as Shannon’s work defines communication channel capacity it defines the transfer rate from a media surface to the read head.

Shannon told us how much could be passed through a channel but didn’t say how. Since 1948 a lot of research effort went into accurately detecting errors and correcting them. Error correction codes (ECC) add extra bits to messages, but their cost pays off in their ability to work around errors. For instance, without ECC the two Voyager spacecraft, now leaving our solar system, would be unable to phone home with the information they’ve gathered because noise would overwhelm their signals. Typically in hardware, like memory, error correction is referred to as ECC. In communications, the term forward error correction (FEC) is used.

Robust communication, or data transfer, is a combination of fancy software and tricky signal processing. I’m going to focus on the software side in this article. You may find some of these techniques useful in communicating data among your devices. While I’ll be using the term communications keep in mind this is applicable to transferring data in general.

Continue reading “Error Detection And Correction: Reed-Solomon, Convolution And Trellis Diagrams”

Messages From Hell: Human Signal Processing

Despite the title, there’s no religious content in this post. The Hell in question is the German inventor [Rudolph Hell]. Although he had an impressive career, what most people remember him for is the Hellschreiber–a device I often mention when I’m trying to illustrate engineering elegance. What’s a Hellschreiber? And why is it elegant?

The first question is easy to answer: the Hellschreiber is almost like a teletype machine. It sends printed messages over the radio, but it works differently than conventional teletype. That’s where the elegance comes into play. To understand how, though, you need a little background.

Continue reading “Messages From Hell: Human Signal Processing”

Store Digital Files For Eons In Silica-Encased DNA

If there’s one downside to digital storage, it’s the short lifespan.  Despite technology’s best efforts, digital storage beyond 50 years is extremely difficult. [Robert Grass, et al.], researchers from the Swiss Federal Institute of Technology in Zurich, decided to address the issue with DNA.  The same stuff that makes you “You” can also be used to store your entire library, and then some.

As the existence of cancer shows, DNA is not always replicated perfectly. A single mismatch, addition, or omission of a base pair can wreak havoc on an organism. [Grass, et al.] realized that for long-term storage capability, error-correction was necessary. They decided to use Reed-Solomon codes, which have been utilized in error-correction for many storage formats from CDs to QR codes to satellite communication. Starting with uncompressed digital text files of the Swiss Federal Charter from 1291 and the English translation of the Archimedes Palimpsest, they mapped every two bytes to three elements in a Galois field. Each element was then encoded to a specific codon, a triplet of nucleotides. In addition, two levels of redundancy were employed, creating outer- and inner- codes for error recovery. Since long DNA is very difficult to synthesize (and pricier), the final product was 4991 DNA segments of 158 nucleotides each (39 codons plus primers).

Continue reading “Store Digital Files For Eons In Silica-Encased DNA”