A few days ago we saw a post from [samuirai] at the Shackspace hackerspace in Stuttgart on breaking the minteye captcha system. Like most other captcha cracks, [samuirai] used the voice accessibility option that provides an audio captcha for blind users. Using the accessibility option is a wonderful piece of work, but [Jack] came up with an even more elegant way to defeat the minteye captcha.
For those unfamiliar, the minteye captcha provides a picture tossed through a swirl filter with a slider underneath. Move the slider left or right to eliminate the swirl and you’ve passed the, “are you human” test. Instead of looking for straight lines, [Jack] came up with a solution that easily defeats the minteye captcha in 23 lines of Python: just minimize the length of all the edges found in the pic.
The idea behind the crack is simply the more you swirl an image, the longer the edges in the image become. Edge detection is a well-studied problem, so the only thing the minteye cracking script needed to do was to move the slider for the captcha from the left to the right and measure the lengths of all the edges.
[Jack] included the code for image processing part of his crack, fortunately leaving out the part where he returns an answer to the minteye captcha. For that, and a very elegant way to crack a captcha, we thank him.
poor minteye
I enjoy programming surprisingly robust information system administration tools in python. Just my basic automated DBA tool is 300 lines, but runs over 10,000 lines of python code and another several hundred SQL lines at minimum per iteration.
…23 lines. That is disingenuous. Here is why:
import cv2
import sys
import numpy as np
import os
import matplotlib.pyplot as plt
Each one of those lines adds hundreds to over ten thousand lines apiece. Not all of those methods are used, but this was not as simple as writing 23 lines of code. To get this hack to work required knowledge of those modules and the skills to know how to leverage them. There is no sense in discounting your own knowledge and work just because you were clever enough to find an elegant solution. Value yourself or no one else will.
Oh and as an aside, check that link and look at the very fist comment. That seems to be disgustingly common. Someone off in India or China (just examples, you get more bad apples with 2 billion people) just expects you to do more work for them despite giving the original out freely. It seems this happens far more often if the code in question could be of benefit to nefarious characters.
I get your point (which other have made as well). But really, Sobel is very simple:
http://www.jwandrews.co.uk/2013/01/breaking-the-minteye-image-captcha-in-34-lines-of-python/
still dishonest
Writing in an HC12 environment, I’d never claim that I wrote the ADC functions that I call on, for example. Do you charge for inherent functions and called libraries that you didn’t write when you’re billing a client for software written?
So I guess minteye can just stop now?
Neat hack!
A slight clarification to the summary; it appears his solution doesn’t measure the “length” of the edges explicitly, but the total sharpness (or “edginess”?) of the image. The swirl filter stretches the edges, but also blurs the image as a side effect, so the minimum-edginess image is usually one click either side of the ‘correct’ image (which is close enough for the CAPTCHA).
I got bamboozled by a similar effect when writing an image derotation algorithm for homebrew optical pick & place… for test purposes, I had an upright image of a component, ran it through a series of random rotations and fed the result to the algorithm. I couldn’t believe how well it worked picking out the less-rotated ones, and it was so simple! Er… turns out it was basically just measuring the amount of blur added by the rotation filter, which correlated with the amount of rotation.
Minteye could thwart this hack for a little while by adding a random amount of blur/sharpening to each result image, but summing actual contour lengths in cv wouldn’t be much harder, and the arms race would continue…
Hi Tim,
It is actually measuring the total length of edges. If it was sharpness, as you suggest, the graphs would all be inverted. The little spikes at the bottom of the graphs *is* a sudden increase in sharpness though, as explained.
Nice, small (code) is beautiful.
That’s sad. Do you know just how many captcha fraud spammers your code will put out of work? :-)
I don’t think people realize that Minteye is less about blocking spammers and more about forcing people to pay more attention to specific ads. As long as most people are still manually getting past the ad captcha, then they’re still in business.
Maybe it’s a good idea to use hue instead of swirl.
I find it interesting that to Register for this site, you need to type in an undecipherable CAPTCHA (at least I can’t read it). Isn’t the point of CAPTCHAs to let real humans in and keep bots/spiders out? http://forums.hackaday.com/ucp.php?mode=confirm&confirm_id=f370bab4b631c5b2acfaa1d9a0b55594&type=1
Another crap captcha to crack: swipeads.co
Cool. Personally I switch betwen Java and Python for image processing depending on the need. I find that cv2 and pillow work most of the time but nothing can beat some good old fashioned Laplacian edge detection filling; etc. done on my own
also of use
def declutter(self,inarr):
print “Decluttering Captcha”
print “Converting to Greyscale First”
arr=self.greyscale(inarr)
print “Decluttering Started…”
#attempt to remove clutter
width=len(arr[0])
height=len(arr)
i=0
j=0
eo=0
avg=0
totobjs=0
fo=False
ws=0
sxx=0
xs=[]
print “Getting AVerages”
#get average object widths and other stats for sd calculation
for i in range(len(arr)):
j=0
for c in arr[i]:
if arr[i][j]128 and fo is True:
fo=False
totobjs+=1
avg+=j-ws
xs.append(ws)
ws=0
j+=1
print “Tot: “+str(totobjs)
print “Avg: “+str(avg)
avg=avg/totobjs
print “Finding SD”
#get sxx
for i in range(len(xs)):
sxx+=((xs[i]-avg)*(xs[i]-avg))
#get sd
sd=math.sqrt((sxx/(totobjs-1)))
eo=sd*4
o=sd*2.5
#eradicate elements that are not within the average width
i=0
j=0
print “Decluttering”
for i in range(len(arr)):
j=0
for c in arr[i]:
if c>128 and fo is False:
fo=True
ws=j
elif c (avg+o) or j-ws<(avg-o):
for p in range(ws,j):
arr[i][p]=255
j+=1
return arr
def greyscale(self,inarr):
arr=inarr
arr[arr=128]=255
return arr