Google Books Team Open Sources Their Book Scanner

November 16, 2012

It’s no secret that Google has been scanning hundreds of thousands of books in the hope of recreating the Library of Alexandria. Publishers and authors really didn’t like that idea, so the Google books team is doing the next best thing: they’re releasing the plans for a very clever book scanner in the hope others will pick up the torch of creating a digital library of every book ever written.

Unlike some other book scanners we’ve seen that rely on an operator manually flipping pages, this linear book scanner turns the pages automatically with the help of a vacuum cleaner and a cleverly designed sheet metal structure after passing them over two image sensors taken from a desktop scanner.

The bill of materials comes in at around $1500, but according to the official design documents this includes a very expensive scanner, something that could be replaced in true hacker style with a few salvaged flatbed scanners.

After the break you can check out a Google Tech Talk presented by [Dany Qumsiyeh] going over the design and function of his DIY book scanner. There’s also a relatively thorough design document over on a Google code page.

[youtube=https://www.youtube.com/watch?v=4JuoOaL11bw&w=470]

39 thoughts on “Google Books Team Open Sources Their Book Scanner”

word clock says:

November 16, 2012 at 4:22 am

this is great, a proven design that really works.

Report comment

Reply
henry says:

November 16, 2012 at 4:38 am

I love the fact it’s open source, and they’ve provided complete plans in the pdf to build the thing.

Report comment

Reply
Fredrik says:

November 16, 2012 at 4:40 am

It should be possible to half the scan time by having another vacuum slot and scanner in the reverse direction shouldn’t it?

Report comment

Reply
1. targetdrone says:
  
  November 16, 2012 at 7:32 am
  
  @Fredrik,
  He mentions that the factor limiting speed was due to their particular stepper motor not being powerful enough to go faster. That can be changed.
  
  Near the end of the presentation he talks about scaling it up. One idea is that a single operator could keep about 10 of these machines loaded and scanning, making the human more efficient.
  
  Another idea was a drawing of a stretched model with a dozen page flipping mechanisms and scanners, and a conveyor belt carrying several carriages with books one way over the scanners, a return belt path that included unfinished books, and two operators. The forward passes would each scan two dozen pages from each book, and it would carry multiple books at the same time. The far end operator would send back the books that hadn’t completed scanning.
  
  Report comment
  
  Reply
nes says:

November 16, 2012 at 5:31 am

If I was a librarian I’d be horrified if someone turned up with this contraption. I guess Google did it all by hand at the libraries they were allowed to digitize.

Nice work though and with a bit more development it could be very good. If building one to the plans I would use 3mm Foamex board instead of stainless steel. It’s as easy to work as cardboard but totally dimensionally stable and hard wearing.

Report comment

Reply
1. John says:
  
  November 16, 2012 at 12:02 pm
  
  Why are you horrified?
  
  Report comment
  
  Reply
  1. Anto says:
    
    November 16, 2012 at 5:17 pm
    
    Probably the back and forth motion over stainless steel. It does look more than a little like a cross between a bucking bronco and a table saw.
    
    It’s a clever design, but compared to the earlier diybookscanner its harder on the books.
    
    Report comment
    
    Reply
  2. Mojo says:
    
    November 19, 2012 at 1:51 pm
    
    “Hello, may I put all your books into my electric guillotine?”
    
    Report comment
    
    Reply
Zee says:

November 16, 2012 at 5:38 am

That does not look like it’s gentle on the books at all.

Report comment

Reply
1. targetdrone says:
  
  November 16, 2012 at 7:21 am
  
  @Zee,
  The presenter mentions that about 40% of books scanned have some kind of problem, either a folded or a torn page, but they’ve done some work on preventing problems.
  
  But, he talked to an archivist, who pointed out that risks of damage are always present for any library book. And the risks of losing the information forever are higher than the risks of damage from scanning it once.
  
  And compare this device to the common book scanning method of sawing off the binding and feeding the pages through a sheet scanner. Much less damaging.
  
  Report comment
  
  Reply
  1. Zee says:
    
    November 16, 2012 at 10:11 am
    
    I beg to differ. The 100k scanners are certified to be gentle on the books. He says so himself in the presentation.
    
    Report comment
    
    Reply
2. barry99705 says:
  
  November 16, 2012 at 1:53 pm
  
  Even if they’re “certified” if it grabs a damaged page, it can still tear it out. Like the saying goes, “shit happens”.
  
  Report comment
  
  Reply
  1. Michael Jensen says:
    
    November 19, 2012 at 5:05 pm
    
    Yeah but guess what that guarantee does? If they wreck your book, they’ll buy you a new one up to a certain cost.
    
    Report comment
    
    Reply
fartface says:

November 16, 2012 at 6:26 am

My problem with this is that opening the book FLAT is damaging to some rare books. the manual page turn V shaped ones work best for rare one of a kind books that you dont want to damage.

Report comment

Reply
1. ZenoArrow says:
  
  November 16, 2012 at 6:46 am
  
  @fartface
  What are you talking about, the books aren’t opened flat, they’re moved over a V shaped scanner.
  
  Report comment
  
  Reply
  1. nes says:
    
    November 16, 2012 at 7:39 am
    
    Looks like someone might have hit the comments before watching the video or reading the linked article. :-)
    
    Report comment
    
    Reply
RobinJood says:

November 16, 2012 at 6:37 am

It’s amazing that if I were to do this, the feds would swoop in and beat me down like Kim.com but since it’s Google there’s no problem.

Report comment

Reply
1. twdarkflame says:
  
  November 16, 2012 at 1:44 pm
  
  Because Google is only dealing with out of copyright or works with no copyright claimed, or ones they have got licenses too.
  The contaversy of Google Books was if anyone had the right to publish orphaned works.
  
  This is quite different to megaupload that was hosting things with very well known copyright owners.
  
  Both had some dodgyness to the decisions, but there is a clear difference here.
  
  Report comment
  
  Reply
XOIIO says:

November 16, 2012 at 6:43 am

I hardly think that this needs to be this large, I’m sure a much smaller and effective version could be made for much cheaper, however I don’t hold enough interest in this to theorise, just seems overdone, but the whole vacume method to turn a page should be able to be shrunk. Perhaps a vacume suction cup like in automated assembly lines.

crap, i did theorise. oh well.

Report comment

Reply
thx says:

November 16, 2012 at 7:58 am

i was wondering wich scanner has two imaging sensors in it.

Report comment

Reply
1. drdog09 says:
  
  November 16, 2012 at 8:31 am
  
  Practically any scanner or MFP printer that says it does duplex single pass scanning has two scanner bars. Another words probably about 40% of the market. There are probably a lot of scanners as well that are not duplex capable but the board that comes with the scanner is capable of doing so. i.e. one board used by an entire product suite to reduce overall design count by the mfr.
  
  Report comment
  
  Reply
2. rasz says:
  
  November 16, 2012 at 8:36 am
  
  $750 one, as author mentioned scanner is half the cost.
  
  Report comment
  
  Reply
3. jaap says:
  
  June 23, 2013 at 10:28 am
  
  I you are on a budget, you could build it with just one sensor, scanning only one side, rotate the book 180 degrees, and then scan all opposite pages.
  
  Report comment
  
  Reply
c0derage says:

November 16, 2012 at 9:10 am

I like the design but agree it could be done with a smaller footprint. He does mention some methods to address this and I understand the ‘let us get it working first then optimize’ approach.

To address the margins issue, the photo arrays could be mounted parallel to the wall and use a long and thin 90 degree prism to refract the image along the correct plane. This would get them much closer to the edge of the margin. Of course the image would be mirrored but an easy post processing fix.

Report comment

Reply
Mark Scholes says:

November 16, 2012 at 9:31 am

That’s not Google’s book scanner, that’s some guy’s beta book scanner he brought to Google to demo

Report comment

Reply
1. Jesse says:
  
  November 17, 2012 at 2:07 pm
  
  He is a Google employee. He developed it during his 20% time.
  
  Report comment
  
  Reply
spuder says:

November 17, 2012 at 3:32 pm

There have been ideas passed around to convert public libraries into places to use tools (kind of like a hacker space). I’d love for my local library to have one of these I could reserve to scan my own journals. Doesn’t make sense to build one myself to scan only a few thousand pages.
If anyone in Salt Lake City does build one, I’d love to use it for an afternoon :)

Report comment

Reply
Frank Cohen says:

November 18, 2012 at 5:33 pm

A wonderful build and some very elegant solutions.

Report comment

Reply
Reno says:

November 27, 2012 at 2:11 am

Is there working link to the scanner software somewhere?

Report comment

Reply
Dewey says:

November 27, 2012 at 8:42 am

There is a new scanner out of Tokyo or somewhere in Asia that can flip the pages and take pictures of each page. It uses lasers to re-align bends in the images where motion created bending along the normal curvature of the book. It can go through an entire large book in about 10 seconds. I wonder what the benefits of this vs that are.

Report comment

Reply
1. willrandship says:
  
  December 1, 2012 at 3:04 am
  
  Cost. Definitely Cost.
  
  Report comment
  
  Reply
james says:

November 29, 2012 at 5:46 pm

A quicker and dirtier way. Cut the spine of the book using a bandsaw, then feed the pages though a photocopier scanner sheet feeder. Once done comb bind it or recyle the lot.

Report comment

Reply
1. Travis Littlechilds says:
  
  December 6, 2012 at 4:33 am
  
  Bandsaw? Most printing places have a guillotine cutter that can cut up to 1000 pages cleanly in one cut. Much quicker, cleaner and easier than a bandsaw.
  
  Report comment
  
  Reply
kamathln says:

December 14, 2012 at 4:01 am

I can udnerstand this for the old books. But for recent books, wouldnt it be easier to get the soft copy directly from the publishers? Do they do that for newer books?

Report comment

Reply
HasXAhmd says:

February 10, 2013 at 11:20 pm

This is a nice project. I am planning to built the structure. I am confused about the vaccum structure. Can you help me guide how to join the pieces you mentioned in your PDF file for the vaccuum section?

Report comment

Reply
HasXAhmd says:

May 2, 2013 at 12:13 am

how did you able to interface scanner, scanner usually calibrates whenever it is powered, how can u change it to calibrate through this structure.

Report comment

Reply
1. jaap says:
  
  June 23, 2013 at 10:34 am
  
  They attached the white calibration surface to the saddle that holds (and moves) the book. When powered up, the scanner will calibrate using that.
  
  Report comment
  
  Reply
  1. Hasan Ahmed says:
    
    June 24, 2013 at 4:22 am
    
    I used a similar size glass with white strips but my scanner didn’t work. It gives an error, but whenever i place all the components back to casing, it works fine. its a Canon LiDE 110
    
    Report comment
    
    Reply
Marcello says:

February 22, 2017 at 3:42 am

Is the software they used (make photo and compensate for curved pages) availavle?

Report comment

Reply