Google has acquired reCAPTCHA and plans to use the system for digitizing books. Wait… what? CAPTCHA is the method of requiring a user to type in a visually obscured word to prove they are human. How can this digitize books? The answer is a bit obscure and takes some time to discover, but you’ll have fun along the way.
The Google blog links to a Google TechTalk video on Human Computation as an example of how they plan to use their new acquisition. It’s embedded below but at 51 minutes we figure most won’t watch it all so we did it for you. This fascinating discussion looks at how people are already being tricked into solving CAPTCHA challenges, and shows several tested implementations of getting people to input cognitive data computers cannot, under the guise of playing games.
Spammers have to beat the CAPTCHA system in order to get large numbers of free email accounts. There have been examples of software overcoming this test such as the greasemonkey script that beat MegaUpload’s security, or Time Magazine’s poll being hacked. But, for the most part, only humans can pass the test. People seeking to bypass millions of CAPTCHA challenges either pay for sweatshop laborers to solve them or, more creatively, they get you to solve them when cruising for porn. This is the proof of concept; we can use people to interpret words computers cannot if we use the right carrot.
Talked about in the video, the ESP game was written in order to correctly tag photographs. Players are shown pictures and asked to type what they see. The round keeps going until the two have typed the same word. With a lot of players, and proper safeguards, these tags are incredibly accurate. Furthermore, the game has been very popular and has the potential to accomplish herculean feats in short amounts of time (namely, tag every image in Google’s image search in just a few months).
It seems that Google plans to use these methods to digitize books that are otherwise very difficult to scan with Optical Character Recognition. According to the video, 9 billion human hours were spent playing solitaire in 2003. What if a small portion of this time could be diverted over to playing games that added to the digitized knowledge cache? If the right type of verification game can be developed it will allow Google to tap society as their typing minions. It’s an interesting proposition and frankly we hope to see it happen.