MegaUpload captcha cracking in JavaScript


This was certainly the last thing we expected to see today. [ShaunF] has created a Greasemonkey script to bypass the captcha on filehosting site Megaupload. It uses a neural network in JavaScript to do all of the OCR work. It will auto submit and start downloading too. It’s quite a clever hack and is certainly helped by the simple 3 character captcha the site employs. Attempting to do the same thing with ReCAPTCHA has proven much more difficult.

UPDATE: [John Resig] explained of how it works.

[via Waxy]


  1. johnny says:

    The funny thing is the ReCaptcha is actually piggy backing difficult OCR of old texts while also doing a human test. So, if Recaptcha is ever “broken”, they would be solving a significant machine learning problem that would help libraries and text archives world wide.

  2. blimey says:

    jDownloader has this function as well along with other features

  3. blimey says:


  4. Lord Taco says:

    it is somewhat simpler to use less restrictive uploading sites. I never was as big of a fan of MegaUpload as, say, WillHostForFood.

  5. @johnny ReCaptcha shows you a word it knows and a word it doesn’t, so they’d only need to solve the known word. Zero gain for ReCaptcha.

  6. joe57005 says:

    They’ll just have to start using the Voight-Kampff test.

  7. realyst says:

    It’s a OCR neural network captcha decoder…in Javascript…at 486 lines of code.

    I’m just now getting dialog boxes and stuff to draw out in JS and this guy’s building an effing Skynet with it in less lines of js then you see in your average cheesy AJAX page.

  8. Tachikoma says:

    Megaupload’s captcha wasn’t particularly mind blowing in terms of character obscurity in the first place. A normalised cross correlation filter could do the job just as easy.

    I have to say, ShaunF’s little neural network code is pretty cool. However, I can see a couple of problems with the neural network approach.

    Neural networks need a training data set (eg. the Megaupload’s captcha images) in order to pre-calculate the weights required for image recognition.

    Its classification reliability will be heavily dependent on the choice of training data. Basically there is a danger of over training, or the neural net becoming too specialised for a particular training data set. In such cases, it would be easy to defeat the neural network by simply changing the CAPTCHA images in a significant way. Realistically speaking, it doesn’t take much effort to change a CAPTCHA font – for example.

    Also, neural networks trained with a much broader data set will have more false positives and false negatives during recognition. Very fiddly.

    Anyone hoping to break Recaptcha in a similar way will have to wait for a few more decades, I’m afraid.

  9. Timothy says:

    It uses a neural network, huh? That’s some impressive stuff

  10. Chuck Norris says:


    I was trying to bypass the 40 seconds in megaupload.

    Apparently it’s a var who contains the seconds left and change name at each page loading. (looks like x2850, x45698, x76954, …) so I made a greasemonkey script to automaticly find this var name, wich it does but I can’t change his value …

    If someone thinks he can help me or just wana check out the source :

    $bad_english = true;

  11. cde says:

    Chuck, that timer is just a client side refresh. The real timer is server side, and can’t be changed.

  12. Chuck Norris says:

    ha ok … too bad …

    but anyway why can’t I access to his value ?

    alert(” end => “+end+”\r\n this[end] => “+this[end]);

    this returns

    end => x5258
    this[end] => undefined

    And also I red the sources of the captcha thing … it’s mad !

    And THX

  13. cde says:

    Using javascript? Try this.end

  14. Chuck Norris says:

    returns the same.

    but might be because greasemonkey is executed somehow somewhere else than the var I’m tryin to change.

    Anyway thx ! And too bad for the server countdown !
    Someone knows if this is bypassable ? Like an algo from page id to file id or something like on youtube&co

  15. Skyler Orlando says:

    @chuck: It sure is. Just sign up for a premium account. ;)

  16. falcolas says:

    @eliot True, but words which were unknowns to OCR are then later used as known samples once enough users identify the same unknown word. Hence, their method is still fairly secure.

  17. hack says:

    small easy exe hack to update your info, to seem as if you are a premium account user. New, beta, version 1.5

  18. amjadk says:

    you know they changed how the captchas are now since this came out :P

  19. Kate says:

    I am not much of megaupload fan, and it seems to me that at rapidshare in general and at one of its search engines ( )in particular there are no such bugs.

