Page rankings are the secret sauce of websites that automatically aggregate user submissions. The basic formula used by Hacker News was published a few years back. But there are several pieces of the puzzle that are missing from that specification. [Ken Shirriff] recently published an analysis that digs deeper to expose the article penalization system used by Hacker News’ ranking engine.
One might assume that the user up and down votes are what determine a page’s lifespan on the front page. But it turns out that a complex penalization system makes a huge difference. It takes into account keywords, and domain names but also weighs controversy. It’s a bit amusing to note that this article on the topic was itself penalized, knocking it off of the front page.
You can get the full details of the system from his post, but we found his investigation methods to be equally interesting. He scraped two pages of the news feed every minute using Python and the Beautiful Soup package (a pretty common scraping practice). This data set allowed him to compare the known algorithm with actual results. What was left were a set of anomalies that contained enough sense for him to reverse engineer the unpublished formulas being used.
6 thoughts on “How Hacker News Page Rankings Really Work”
did the same, but with 9gag. the curves over there look quite different, and that’s why i think they are not user generated: http://i.imgur.com/FF7i3.png you see nearly no impact of the frontpage, and the position on the frontpage is also quite linear. whole writeup: http://www.reddit.com/r/9gag/comments/zmeqy (they even change the user(?) submitted images and remove watermarks, or search for versions with bigger resolution)
my setup was a python script, too, but i used http://scrapy.org/ which is a very potent tool, that makes it quite easy to follow links and scrape a complete page. in my case i have to work around the feature, because i wanted to poll for certain pages only.
What’s a Hacker News and why would anyone be so concerned with ranking there as to create a scientific study like this?
It’s a misnomer – it has nothing to do about “hacking”. It’s a circle jerk club of wanna be investors and their ass kissing startup followers drooling over the latest blog post about GSD (get shit done) or how to refine your “elevator pitch”. Just like the Digg of old days (back before Rose ran it into the ground) the site is gamed to no end by a few “elites”.
Totally agree about the circle jerk. Also the comments section, like you insinuated with the comparison to digg, turned into the equivalent of youtube comments.The comments section use to be contain intelligent conversations that actually added to the articles.
Vonskippy, thanks for you input. I have to admit, your emotionally charged response made me confused even more that I was before. I guess, I have to see for myself. But the most basic question is: is this https://news.ycombinator.com/ what we are talking about? The way the URL was left unspoken has me feeling like I was sleeping under a rock for 10 yrs – everyone talks about it like they know what / where that is!
Yes, that is the site.
Please be kind and respectful to help make the comments section excellent. (Comment Policy)