Automatic Microfiche Scanner Digitizes Docs

While the concept might seem quaint to us today, microfiche was once a very compelling way to store and distribute documents. By optically shrinking them down to just a few percent of their original size, hundreds of pages could be stored on a piece of high-resolution film. A box of said films could store the equivalent of several gigabytes of text and images, and reading them back only required a relatively simple projection machine.

As [Joerg Hoppe] explains in the write-up for his automatic microfiche scanner, companies such as Digital Equipment Corporation (DEC) made extensive use of this technology to distribute manuals, schematics, and even source code to their service departments in the 70s and 80s. Luckily, that means hard copies of all this valuable information still exist in excellent condition decades after DEC published it. The downside, of course, is that microfiche viewers aren’t exactly something you can pick up at the local Big Box electronics store these days. To make this information accessible to current and future generations, it needs to be digitized.

The camera panning over a full DEC microfiche sheet.

[Joerg] notes there are commercial services that would do this for you, but the prices are just too high to be practical for the hobbyist. The same for turn-key microfiche scanners. Which is why he’s developed this hardware and software system specifically to digitize DEC documents. The user enters in the information written on the top of the microfiche into the software, and then places it onto the machine itself which is based on a cheap 3D printer.

The device moves a Canon DSLR camera and appropriate magnifying optics in two dimensions over the film, using the Z axis to fine-tune the focus, and then commands the camera to take an image of each page. These are then passed through various filters to clean up the image, and compiled into PDFs that can be easily viewed on modern hardware. The digital documents can be further run though optical character recognition (OCR) so the text can be easily searched and manipulated. In the video after the break you can see that the whole process is rather involved, but once the settled into the workflow, [Joerg] says his scanner can digitize 100 pages in around 10 minutes.

A machine like this is invaluable if you’ve got a trove of microfiche documents to get through, but if you’ve just got a sheet or two you’d like to take a peek at, [CuriousMarc] put together a simple rig using a digital microscope and a salvaged light box that should work in a pinch.

18 thoughts on “Automatic Microfiche Scanner Digitizes Docs

  1. I have code listing on fiche from programs I wrote in the early 80’s. Each image is a standard computer printout page which has 66 lines of 132 characters. IIRC there are about 80-100 pages per fiche.
    So, if 10 pages wide, then there is 1320 characters on a line and this is ignoring the gaps between pages.
    So we are talking about a resolution way above 10,000 pixels wide.

    So we’re talking about a pretty high resolution here if you want to capture the whole fiche in one go!

  2. Hi,
    I really thought a lot about flat bed scanners before taking the afford to build this gear.
    I agree, you need at least 10000dpi for dense microfiches.
    There are scanners who claim to have “physical 9600dpi”, but on independent tests the true optical resolution is only a fraction of this.

    Some calculations copied from my website:
    “A program listing page on a DEC fiche is about 6mm width. DEC imaged 132 column fanfold printouts here, so one character is 6/132 mm = 45µm width. If they printed it with a dotmatrix, a char had 6 printer pixels (quality is usually much better). Lets say we need 12 scan dots to image the character, so we need to scan at 45µm/12 = about 4µ. The Nyquist sampling theorem requires double scanning frequency, so we need to scan at least with 2µ resolution.
    That’s 25.6 / 0.002 = 12800 bpi.
    So a flatbed scanner with 9600 TRUE optical resolution in both direction should ALMOST do it. But this gives blurry letter shapes.

    In 2020 for example the Epson Perfection V600 is rated to have true 9600 bpi, tests just mention 6400. And the true optical resolution of a CanoScan9000f with “9600 bpi” was tested to be only 1200 dpi.

    And we know: if something is working *almost* in an ideal world, it will never do in real world. We really need extra resolution to compensate mechanical tolerances, sharpness problems, or marketing hype.

    Its clear that a flatbed scanner with – say – 2400 bpi can be used on fiches with bigger characters. That explains positive results of some web reports: these guys just had bigger text on their fiches.

    In contrast: Taking a photographic picture of a 6mm microfiche page with a DLSR with – say – 4000 pixels horizontally gives 6mm/4000 = 1.5µ per sensor pixel, with 30 pixels per letter. This is about 17000 bpi.”

    1. for this reason I aquired an Agfa XY-15 A3+ flatbed scanner. it has an optical zoom up to 15k dpi and a build in xy gantry so you make a bed scan, and from that several previews for each micropage. the good thing, settings files a re plain text, so scriptable. several microfiches can be placed on the bed to mak a bigger scan. the dwawback is the thing is huge and not that fast at 15kdpi. a scan takes a minute per micropage. again the good thing is you do’nt have to babysit the machine and can start a batch in the morning and come back from work with all scna done.

      but this setup, maybe the camera substituted with a hi res picam is way more versatile and smaller.

      1. “A scan takes a minute per micropage” means: a fiche with 100 pages is 100 minutes?
        I’ve seen these slow scanning times also in one public fiche scanner here in the university: 1 minute per document page.
        Also the CanoScan9000f was tested to need about 30 minutes (!) for a full bed scan at best resolution … what are these devices doing all the time ?

    2. A modified flatbed scanner /may/ have worked if you replaced the sensor and replaced the motors and/or geared them down. This is a big bother though and doesn’t seem like much fun.

      Personally, I prefer my cheapskate approach: since microfiche are actually 1-bit images, read it like a large CD! A $5 laser pickup, a crude Y axis gantry, tiny motor drivers, an ATtiny MCU to relay the data and control the motors, and a spinning platform could get the job done! This approach relies heavily on software processing, so it’s no fun if you aren’t an avid programmer. :P

      In all cases, software processing is definitely the most important element!

      All that said, if it works, it works!

  3. >”To make this information accessible to current and future generations, it needs to be digitized.”

    We need to think this through to an end-use of 100 years, 1000 years, and 10,0000 years into a dubious and uncertain future.

    Storage media? M-Disc is oft cited as 1000-year media, but some test results from the military make me doubtful.

    What if we had reliable 1000-year media? What about data format? I recently made a pile of $USD from a former client recovering code and data from 8-inch floppies. And I earned every freakin penny (ha’pence for Jenny), mostly because I had to write several hundred lines of code to parse embedded data (geez, give me a break, I wrote the code almost thirty years ago).

    And what about the programming tools used to make the media and data? The aforementioned project had been done in Fortran, and I do not remember which version, but I was able to find most of the source code, but was not compilable (is that a word?) using GCC Fortran, no matter what incantations were used. So I sat down and spent a day looking at my old code while continually muttering “WTF?”. And the recovered embedded data format did not match what I saw in code.

    So paper and microfiche are not looking too bad at this point in humanity’s short and miserable history.

    1. A wander down the archeological lane seems to suggest that stone may meet the 1000-year media test, depending on storeage condions… Of course, storing and searching the “media” may be a bit challenging. :)

      1. Density is kind of low.

        I seem to recall reading that microfilm has been put in time capsules. You just need a magnifier, not some fancy reader.

        That reminds me, I.can picture a microfilm card, supplied with magnifying glass.. But was it about astronaut survival, in case landing in wilderness, or did I see it in a catalog aimed at the public?

    2. >”So paper and microfiche are not looking too bad at this point in humanity’s short and miserable history.”

      My employer learned that sort of thing the hard way. We have 25 year hole in our archives from when we were archiving exclusively in digital formats before switching back to microfiche. Some progress has been made on recovering some of it, but management has decided to never risk it again. We lost out on a very lucrative contract because we couldn’t recover some of our old research data (If we had our data, we would have been able to quote the client with a much shorter time-to-production than our competitors).

      Since it is far cheaper than any other solution we’ve evaluated, we’ve decided to stick to microfiche. And to ensure longevity, some of the engineers built a couple flat-pack microfiche machines with instructions on assembling them printed on acid-free paper. The vast bulk of our data is stored on film, encoded onto QR codes (version 40, Type M error correction), producing A6-sized films that store 60,000 KB of data. Accompanying those films are ones that are just straight copies of documents that describe how to read QR codes, text encodings, identifying executable code from text and binary formats, the file formats used in storing the data, etc.

      We have been tinkering with some methods to extend the life of microfiche beyond the typical 500 years, like replacing the organic film with inorganic materials. Mostly so that we can produce all the components in-house rather than relying on external supplies (Or at least only relying on external sources of basic materials rather than readily usable film). One current thought is to use a 2-layer film, one transparent, the other opaque, then using lasers to ablated section of the opaque layer before covering the opaque with another transparent film layer and applying some heat to laminate the whole thing.

  4. Interesting project. Did you try using autofocus in the camera? If it works, it would reduce some of the setup time. You would still need to get the distance close to correct but getting the last little bit of sharpness would be easy and you wouldn’t have to do multiple test focus locations.

    1. Hi Phil,
      “Autofocus” is an interesting and important point.
      Spent many thoughts on autofocus, still not sure wether its possible.

      1. Feasability and price of a micro photgraphy autofocus system.
      Traditionally in the area of microphotography autofocus functions are not used.
      Focus is done by moving the camera back and forth.
      Regular bellow extenders and also enlarger lenes like my “Rogonar” don’t support autofocus.
      There is a special Novoflex bellow with automatic support: its alone €800.
      And I don’t know if there are even enlarging lenses with autofocus.

      On the other side my whole optic was around 100€, as you can get very good
      reproduction lenses for cheap today.

      2. Speed. I considered programming an own autofocus system,
      by evaluating sharpness of the DSLR image and make a feedback loop controlling
      camera distance to fiche surface. However I estimated the time to iterate to
      a good focus point to 20+ of seconds … per document page on the fiche.

      3. Reliability The fiche images are extremly difficult to focus.
      Most fiches are unsharp by production, and often there are empty page areas with no structures to focus.
      Often scratches and dirt have more contrast than the actual fiche content and would pull the auto focus away.
      I simply can’t imagine good automatic results in many cases.
      And an autofocus must be 100% trustworthy, else you end up checking manully each of the 10000ths images made.

      Bottom line: I still think manual focus on 4 corner frames with interpolation in-between is the best solution.
      You have a human-brain controlled focus algorithm, and zero autofocus time then.

  5. How about display each page with a projector and photograph each display using a mounted camera with auto settings and a timer? We lose quality, but we gain speed in the digitization process. Better than 100 minutes per page using the fancy scanner. Probably good enough for text A4 docs with text/pictures/diagrams (large sheets technical drawing is more problematic).

      1. Great! Thanks for the link. Some institutions may lack tech knowledge to use “DSLR on a 3D printer”, and that solution may be very useful, Tons of microfilms lay forgotten in historical archives, the originals lost or presently very ruined. Original projector machines are also ruined (some of them apparently could reproduce xerox copy from film, a cool thing).

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.