Shmoocon 2006: Covert Crawling: A Wolf Among Lambs

shmoocon

Billy Hoffman has built a site crawler that can hide its activity within normal web traffic. Crawling a website is one of the easiest ways to find exploitable pages, but the systematic nature of the crawl makes it stand out in logs. Billy set out to design a crawler that would behave like a normal web browser. It follows more popular links first (think “news”, not “legal notice”) and it doesn’t hit deep linked pages directly without first creating an appropriate Google referrer. There are tons of other tricks involved in making the crawler look “human” which you’ll find in Billy’s slides over at SPI Labs. You can also read about the talk on Wired News.

5 thoughts on “Shmoocon 2006: Covert Crawling: A Wolf Among Lambs

  1. looks to me like the “most commented on (past 60 days) isn’t working properly. as of today, it’s been 4 months since the psp 2.0 to 1.5 downgrade was posted, and no one has commented on it since Oct 16th, 2005.

    Also, very interesting article!

  2. Is there a legitimate use for this?

    I am not usually one of those people that criticises stuff like this, but do we really want to make email address harvesting and exploit finding easier?

  3. A lot of people have legitimate needs to crawl a site. Think about a site that carries the text of a book but has a strict “no spiders” policy (so they can shut you off when you stop paying, for example.) If you’re a legitimate user but need an offline copy of the book (for field work or whatever), you’re out of luck. Their server will spot a spider instantly, and shut you down.

    But if you have a smart spider that skips around, reads chapters here and pages there, then they’re not likely to notice you or ban you. And you can still get the text of the book you need. Just make sure you delete your local copies of the book once your subscription has ended.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.