Shmoocon 2006: Covert Crawling: A Wolf Among Lambs

January 28, 2006

shmoocon

Billy Hoffman has built a site crawler that can hide its activity within normal web traffic. Crawling a website is one of the easiest ways to find exploitable pages, but the systematic nature of the crawl makes it stand out in logs. Billy set out to design a crawler that would behave like a normal web browser. It follows more popular links first (think “news”, not “legal notice”) and it doesn’t hit deep linked pages directly without first creating an appropriate Google referrer. There are tons of other tricks involved in making the crawler look “human” which you’ll find in Billy’s slides over at SPI Labs. You can also read about the talk on Wired News.

permalink

5 thoughts on “Shmoocon 2006: Covert Crawling: A Wolf Among Lambs”

jared says:

January 28, 2006 at 6:32 pm

looks to me like the “most commented on (past 60 days) isn’t working properly. as of today, it’s been 4 months since the psp 2.0 to 1.5 downgrade was posted, and no one has commented on it since Oct 16th, 2005.

Also, very interesting article!

Report comment

Reply
Eliot Phillips says:

January 28, 2006 at 7:02 pm

No, it isn’t broken because I see at least two comments show up in my email everyday. What is broken is the post not showing more then 250 comments. Sounds like a good enough excuse to me to lock the thread.

Report comment

Reply
Cuba says:

January 30, 2006 at 12:45 am

Is there a legitimate use for this?

I am not usually one of those people that criticises stuff like this, but do we really want to make email address harvesting and exploit finding easier?

Report comment

Reply
Aaron says:

January 30, 2006 at 1:21 pm

cuba: No point worrying about that now, is there?

Report comment

Reply
jaded says:

January 31, 2006 at 1:12 am

A lot of people have legitimate needs to crawl a site. Think about a site that carries the text of a book but has a strict “no spiders” policy (so they can shut you off when you stop paying, for example.) If you’re a legitimate user but need an offline copy of the book (for field work or whatever), you’re out of luck. Their server will spot a spider instantly, and shut you down.

But if you have a smart spider that skips around, reads chapters here and pages there, then they’re not likely to notice you or ban you. And you can still get the text of the book you need. Just make sure you delete your local copies of the book once your subscription has ended.

Report comment

Reply

Hackaday

Shmoocon 2006: Covert Crawling: A Wolf Among Lambs

5 thoughts on “Shmoocon 2006: Covert Crawling: A Wolf Among Lambs”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Why Apple Dumped 2,700 Computers In A Landfill In 1989

A Field Guide To The North American Cold Chain

The DEW Line Remembered

The Fight To Save Lunar Trailblazer

Hacking When It Counts: DIY Prosthetics And The Prison Camp Lathe

Our Columns

Fixing Human Sleep With Air Under Pressure

Hackaday Links: July 20, 2025

Hackaday Podcast Episode 329: AI Surgery, A Prison Camp Lathe, And A One Hertz Four-Fer

This Week In Security: Trains, Fake Homebrew, And AI Auto-Hacking

FLOSS Weekly Episode 841: Drupal And AI: The Right Tool For Everything

5 thoughts on “Shmoocon 2006: Covert Crawling: A Wolf Among Lambs”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns