Defeating Reddit’s CAPTCHA

cap

Here’s something we’re sure SEO specialists, PR reps, and other marketeers already know: how to write a script to game reddit.

The course of upvotes and downvotes controls which submission makes it to the front page of reddit. These submissions are voted on by users, and new accounts must log in and complete a CAPTCHA to vote. [Ian] discovered that reddit’s CAPTCHA is not really state-of-the-art, and figured out how to get a bot to solve it

The method exploits the 8-bit nature of the distorted grid in the CAPTCHA. Because this grid isn’t pure black or pure white, it’s at a lower intensity than the letters in the CAPTCHA. Putting the CAPTCHA through a threshold filter, deleting any blocks of pixels smaller than 20 pixels, and running it through a classifier (PDF there), a bot can guess what the letters of the CAPTCHA should be.

Out of the 489 CAPTCHAs [Ian] fed into his algorithm, only 28 – or 5.73% – were guessed correctly. However, because he knows which CAPTCHAs had failed segmentation, ignoring those can increase the success rate to 10%. Theoretically, by requesting new CAPTCHAs, [Ian] can get the accuracy of his CAPTCHA bot up to about 30%.

Combine this with a brilliant auto voting script that only requires someone to enter CAPTCHAs, and you’ve got the recipe for getting anything you want directly to the front page of reddit. Of course you could do the same with a few memes and pictures of cats, but you knew that already.

14 thoughts on “Defeating Reddit’s CAPTCHA

  1. Yay, more reasons for websites to create captchas that are barely readable for humans but get cracked by computers easily. Can someone please figure out a way to make captchas die already?

    1. On the upside, captcha created a game of cat and mouse that resulted in better OCR – lots of incremental improvements due to a challenge that grew steadily.

      The same mechanism is going to fuel the follow-up technique. Suppose you base it on natural language or context awareness … 3 years later you’re going to have impressive results in those fields.

      Anyone looking at replacing captchas: take a look at the computationally hard fields in computer science. Lots of those problems translate directly to real world examples. Let both sides benefit.

    2. I think it’s more of a reason to abandon upvote/downvote systems for causing discussion to devolve into a popularity contest, or at least to abandon the pretense that such systems are somehow “democratic.”

  2. the only reasons i can think of why you would want to vote up/down the posts is if you see a post whose headline contains a sexually or racially offensive word and you could get a bunch of accounts to vote up the other articles to get them to fill the recent news list or vote down the sexually or racially offensive posting to get it to fall of the recent news list.

    slyck.com uses a list like that on the front page

    1. People like me would just create a counter-script that automatically upvotes the same list! People who censor things simply for being offensive deserve emotional torment.

Leave a Reply to sneakypooCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.