Scraping blogs for fun and profit

Sometimes when you’re working on a problem, a solution is thrown right at your face. We found ourselves in this exact situation a few days ago while putting together Hackaday’s new retro edition; a way to select a random Hackaday article was needed and [Alexander van Teijlingen] of codepanel.net just handed us the solution.

To grab every Hackaday URL ever, [Alex] wrote a small Python script using the Beautiful Soup screen scraping library. The program starts on Hackaday’s main page and grabs every link to a Hackaday post before going to the next page. It’s not a terribly complex build, but we’re gobsmacked a solution to a problem we’re working on would magically show up in our inbox.

Thanks to [Alex], writing a cron job to automatically update our new retro edition just got a whole lot easier. If you’d like to check out a list of every Hackaday post ever (or at least through two days ago), you can grab 10,693 line text file here.