Scraping blogs for fun and profit

Sometimes when you’re working on a problem, a solution is thrown right at your face. We found ourselves in this exact situation a few days ago while putting together Hackaday’s new retro edition; a way to select a random Hackaday article was needed and [Alexander van Teijlingen] of codepanel.net just handed us the solution.

To grab every Hackaday URL ever, [Alex] wrote a small Python script using the Beautiful Soup screen scraping library. The program starts on Hackaday’s main page and grabs every link to a Hackaday post before going to the next page. It’s not a terribly complex build, but we’re gobsmacked a solution to a problem we’re working on would magically show up in our inbox.

Thanks to [Alex], writing a cron job to automatically update our new retro edition just got a whole lot easier. If you’d like to check out a list of every Hackaday post ever (or at least through two days ago), you can grab 10,693 line text file here.

Follow

Get every new post delivered to your Inbox.

Join 96,656 other followers