Archiving Data On Paper Using 2D Images

It seems like only yesterday we covered a project using QR codes to archive data on paper (OK, it was last Thursday), so here’s another way to do it, this time with a dedicated codec using the full page. Optar or OPTical ARchiver is a project capable of squeezing a whopping 200 Kb of data onto a single A4 sheet of paper, with writing and reading achieved with a standard laser printer and a scanner. It’s a bit harder than you might think to get that much data on the page, given that even a 600 DPI printer can’t reliably place every dot each time. Additionally, paper is rarely uniform at the microscopic scale, so Optar utilizes a forward error-correcting coding scheme to cater for a little irregularity in both printing and scanning.

The error-correcting scheme selected was an Extended Golay code (24, 12, 8),  which, interestingly, was also used for image transmission by the NASA Voyager 1 and 2 missions. In information theory terms, this scheme has a minimum Hamming Distance of 8, giving detection of up to seven bit errors. This Golay code implementation is capable of correcting three-bit errors in each 24-bit block, with 12 bits available for payload. That’s what the numbers in those brackets mean.

Another interesting problem is paper stretch during printing. A laser printer works by feeding the paper around rollers, some of which are heated. As a printer wears or gets dirty, the friction coefficient along the rollers can vary, leading to twisting and stretching of the paper during the printing process. Water absorbed by the paper can also lead to distortion. To compensate for these effects, Optar regularly inserts calibration targets throughout the bit image, which are used to locally resynchronize the decoding process as the image is processed. This is roughly similar to how the alignment patterns work within larger QR codes. Finally, similar to the position detection targets (those square bits) in QR codes, Optar uses a two-pixel-wide border around the bit image. The border is used to align to the corners well enough to locate the rows of bits to be decoded.

In the distant past of last week, we covered a similar project that uses QR codes. This got us thinking about how QR codes work, and even if encoding capacity can be increased using more colors than just black and white?

Thanks to [Petr] for the tip!

43 thoughts on “Archiving Data On Paper Using 2D Images

  1. Hmm. Given that it takes 8 Mb to render a page at 300 dpi, 200 kb seems awfully poor efficiency.
    TFA says they combine the 600 dpi print resolution 3×3 to make 200 dpi, so 3.5 Mb of ‘real’ data.

    Ahh, TFA then says that converts to 200 kB of data after overhead, not 200 “Kb”. (presumed just a typo for “kb”). Case matters!

      1. If the paper is of archival quality and the toner is good, it is a way to store digital data for a very long time as long as you can keep it dry and out of the light. Of course, if you were storing digital data for the next civilization to use, you’ld better include a lot of primers to explain what you’ve left behind and how to read it.

        It would make an interesting art project to develop a full set of primers and an archive of information worth shooting off into the future. Figuring out how best to preserve the data over a thousand years or so while keeping the total size down to two reams of paper. That’s 500 pages of primer and 500 pages of glyphs, which roughly comes to 100mB.

  2. Maybe you can get more layers with colour inkjet. If the ink doesn’t have pigments so you can layer one on top of the other. Probably only works with cyan and magenta, yellow is too weak a colour. There are also inks with multiple shades of grey so you could encode more bits per pixel.

    1. “If the ink doesn’t have pigments so you can layer one on top of the other. ”
      Which introduces an entire dimension of new issues; such as fading and color change.

      Pigment inks are generally the more stable ink-jet printer ‘ink’. Quality color image sensors should be able to easily discern the Yellow, Magenta, and Cyan of HP Smart Tank printers for several years if printed on quality acid-free paper. Pigment inks can last > 100 years.
      https://www.printtopeer.com/dye-vs-pigment-ink/

      Of course, color laser prints would present an interesting study.
      https://www.printing.org/docs/default-source/journal-docs/schonert-fighting-fade-graphics.pdf?sfvrsn=ddf10000_0

      Even with faded colors, smart software should be able to ferret out fade effects if designed to prescan and do an auto-calibration.

    1. An average page of text at a sane font size offers somewhere between 2-5kb iirc. So 100x is a pretty good improvement… But still not something you’d actually want to use for any real-life purpose beyond proving you can or as a conversation starter

        1. Many years ago (~2008?), the company where I worked (insurance company) had very large printers attached to our mainframe for printing out huge volumes of mail. One of the demos the printers used to show how sharp they could print very small font was to print the entire Bible on a single page 8.5″x11″ piece of paper. You had to use a magnifying glass to read it, but it was pretty impressive.

          1. The entire Bible seems like a stretch. Unless you meant microscope instead of magnifying glass. Otherwise I’m thinking that they just printed the new testament. Which is way shorter than old testament. Your standard Bible usually includes both. However a “pocket” Bible is usually just the new testament as well.

  3. Instead of paper and ink, how about anodizing a sheet of aluminum foil and engraving it with a laser engraver? I’ve laser engraved anodized aluminum business card blanks and the achievable detail is quite good. Aluminum foil would certainly hold up better over time than paper.

      1. Anodizing can last a very long time unless abraded away because unlike a lot of iron rust, aluminum anodizing has a similar coefficient of thermal expansion compared to aluminum itself. Virtually ANY aluminum you see that is colored has been anodized because paint doesn’t stick to aluminum very well. ChatGPT 4 says that anodized aluminum kept in a sealed case (such as you would probably need to do with paper and ink anyway) would likely last for centuries. The surface oxide itself would certainly last that long, and the limiting case apparently is whether or not UV light degrades the embedded dye.

      2. Hey, what if you vacuum sputtered aluminum on a surface, wrote to it with a laser, and sealed it inside a polycarbonate disc? Then you could spin it to read the data using reflected laser light! I’ll bet you could store at least a few MB on a disc that’s about 5″ in diameter. If you stored music that way you could have “perfect sound forever”!

    1. I really like this idea for a doomsday archive of information stuck in a box inside of a monument for the next civilization. You’re still going to need to include a rather expansive primer to explain how to read the scrolls. If the primer shows how to read the scrolls, and the scrolls contain information on how to read the data formats, all sorts of crazy stuff could be included. There are ultra-low bandwidth digital voice formats, compressed image formats. There’s all kinds of stuff people have cooked up for CETI (communication with extra-terrestrial intelligences), it would all make for a good method to talk to descendants who view us as legends at best.

  4. Aww, I liked the AI Atlas man trying to hold the big QR block. 😛

    It was a good placeholder at least but an image of the actual sheets is definitely better.

    I do think the AI complainers need to cool it a bit.

    On topic, I hope we see more development on this. Even is it’s not practical for most data it could be an interesting storage method for some things.

  5. So,
    With two recent blogs about data archival on paper, is the World as we know it going to end soon?
    I’d like to know as I have have more room on my credit card.

  6. Microfilm and microfiche stores whole pages in form of images, offering much better compression than this form. Properly stored it can last for ~500 years. Standard 35mm film has practical resolution of about 21-24MPix. In theory one could use focused laser beam to expose low ISO B&W film at micro scale to maximize contrast and achieve much higher resolution.

    Another option to consider is to use a photoresist on aluminum or other metal, expose it with laser, then engrave it chemically. Photolithography can achieve much higher resolutions, vide average CPU/GPU. Consider Rosetta Stone project, where pages of text are engraved on metal disk in multiple languages, using a simple visual cue to indicate “compression” method.

      1. The only semi-reliable method would be masked ROM. (E-)EPROM chips can loose data after 40-50 years. Still, laser engraving or photolithography can work better. Considering that Considering that the oldest surviving photograph is almost 200 years old, I think this method might be the most reliable way, that would also be cost-effective.

        Other option is core rope memory, which uses physical arrangement of wires to store information. Manufacturing it is not hard, but any mistake means starting from scratch. Another low tech option would be punched tape. Paper might be good enough, but metal or plastic might be much better and more stable.

      1. And then everything burns up in a fire, after which you upgrade to inorganic inks on paper stored in an air tight container with refractory lining. You find the x-ray tomography a bit too pricey for data recovery, so the next iteration uses ceramic fiber paper.
        Do I smell a THP category here? :)

  7. Many years ago I played with Paperbak, an open source application that does something similar. I still have the backups I printed out, but I haven’t had to use them yet. Link: github.com/Rupan/paperbak

  8. I like this idea but the implementation needs some OCR friendly metadata on the bottom so that you can pipeline the entire process from stack of random pages to set of extracted files. Thankfully it is open sourced so we can tweak it.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.