How Hacker News Page Rankings Really Work

hacker-news-page-rankings

Page rankings are the secret sauce of websites that automatically aggregate user submissions. The basic formula used by Hacker News was published a few years back. But there are several pieces of the puzzle that are missing from that specification. [Ken Shirriff] recently published an analysis that digs deeper to expose the article penalization system used by Hacker News’ ranking engine.

One might assume that the user up and down votes are what determine a page’s lifespan on the front page. But it turns out that a complex penalization system makes a huge difference. It takes into account keywords, and domain names but also weighs controversy. It’s a bit amusing to note that this article on the topic was itself penalized, knocking it off of the front page.

You can get the full details of the system from his post, but we found his investigation methods to be equally interesting. He scraped two pages of the news feed every minute using Python and the Beautiful Soup package (a pretty common scraping practice). This data set allowed him to compare the known algorithm with actual results. What was left were a set of anomalies that contained enough sense for him to reverse engineer the unpublished formulas being used.

[Ken Shirriff] completely reverse engineers the 1974 Sinclair Scientific calculator

reverse-engineer-sinclair-scientific-calculator

Wow. Seriously… Wow! The work [Ken Shirriff] put into reverse engineering the Sinclair Scientific is just amazing. He covers so much; the market forces that led [Clive Sinclair] to design the device with an under-powered chip, how the code actually fits in a minuscule amount of space, and an in-depth look at the silicon itself. Stop what you’re doing a read it right now!

This calculator shoe-horned itself into the market when the HP-35 was king at a sticker price of $395 (around $1800 in today’s money). The goal was to undercut them, a target that was reached with a $120 launch price. They managed this by using a Texas Instruments chip that had only three storage registers, paired with a ROM totaling 320 words. The calculator worked, but it was slow and inaccurate. Want to see how inaccurate? Included in the write-up is a browser-based simulator built from the reverse engineering work. Give it a try and let us know what you think.

Now [Ken] didn’t do all this work on his own. Scroll down to the bottom of his post to see the long list of contributors that helped bring this fantastic piece together. Thanks everyone!

[Thanks Ed]