This talk from the 2012 LayerOne conference outlines how the team build Stiltwalker, a package that could beat audio reCAPTCHA. We’re all familiar with the obscured images of words that need to be typed in order to confirm that you’re human (in fact, there’s a cat and mouse game to crack that visual version). But you may not have noticed the option to have words read to you. That secondary option is where the toils of Stiltwalker were aimed, and at the time the team achieved 99% accurracy. We’d like to remind readers that audio is important as visual-only confirmations are a bane of visually impaired users.
This is all past-tense. In fact, about an hour before the talk (embedded after the break) Google upgraded the system, making it much more complex and breaking what these guys had accomplished. But it’s still really fun to hear about their exploit. There were only 58 words used in the system. The team found out that there’s a way to exploit the entry of those word, misspelling them just enough so that they would validate as any of up to three different words. Machine learning was used to improve the accuracy when parsing the audio, but it still required tens of thousands of human verifications before it was reliably running on its own.