How Hacker News Page Rankings Really Work

hacker-news-page-rankings

Page rankings are the secret sauce of websites that automatically aggregate user submissions. The basic formula used by Hacker News was published a few years back. But there are several pieces of the puzzle that are missing from that specification. [Ken Shirriff] recently published an analysis that digs deeper to expose the article penalization system used by Hacker News’ ranking engine.

One might assume that the user up and down votes are what determine a page’s lifespan on the front page. But it turns out that a complex penalization system makes a huge difference. It takes into account keywords, and domain names but also weighs controversy. It’s a bit amusing to note that this article on the topic was itself penalized, knocking it off of the front page.

You can get the full details of the system from his post, but we found his investigation methods to be equally interesting. He scraped two pages of the news feed every minute using Python and the Beautiful Soup package (a pretty common scraping practice). This data set allowed him to compare the known algorithm with actual results. What was left were a set of anomalies that contained enough sense for him to reverse engineer the unpublished formulas being used.

Comments

  1. rnj says:

    did the same, but with 9gag. the curves over there look quite different, and that’s why i think they are not user generated: http://i.imgur.com/FF7i3.png you see nearly no impact of the frontpage, and the position on the frontpage is also quite linear. whole writeup: http://www.reddit.com/r/9gag/comments/zmeqy (they even change the user(?) submitted images and remove watermarks, or search for versions with bigger resolution)

    my setup was a python script, too, but i used http://scrapy.org/ which is a very potent tool, that makes it quite easy to follow links and scrape a complete page. in my case i have to work around the feature, because i wanted to poll for certain pages only.

  2. polytechnick says:

    What’s a Hacker News and why would anyone be so concerned with ranking there as to create a scientific study like this?

    • vonskippy says:

      It’s a misnomer – it has nothing to do about “hacking”. It’s a circle jerk club of wanna be investors and their ass kissing startup followers drooling over the latest blog post about GSD (get shit done) or how to refine your “elevator pitch”. Just like the Digg of old days (back before Rose ran it into the ground) the site is gamed to no end by a few “elites”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 93,527 other followers