Scraping blogs for fun and profit

Sometimes when you’re working on a problem, a solution is thrown right at your face. We found ourselves in this exact situation a few days ago while putting together Hackaday’s new retro edition; a way to select a random Hackaday article was needed and [Alexander van Teijlingen] of codepanel.net just handed us the solution.

To grab every Hackaday URL ever, [Alex] wrote a small Python script using the Beautiful Soup screen scraping library. The program starts on Hackaday’s main page and grabs every link to a Hackaday post before going to the next page. It’s not a terribly complex build, but we’re gobsmacked a solution to a problem we’re working on would magically show up in our inbox.

Thanks to [Alex], writing a cron job to automatically update our new retro edition just got a whole lot easier. If you’d like to check out a list of every Hackaday post ever (or at least through two days ago), you can grab 10,693 line text file here.

Are you human? Resistor edition

[PT] tipped us off about a new way to screen bots from automatically leaving comments. Resisty is like CAPTCHA but it requires you to decipher color bands on a resistor instead of mangled text. This won’t do much for the cause of digitizing books, but if you can never remember your color codes this is a good way to practice. Resisty comes as a plug-in for WordPress, add it to your blog and for a geek cred +1.

WordPress 2.7 upgrade in one line

wordpress

BadPoetry WordPress 2.7 has just been released and features a complete interface overhaul. Hack a Day runs on WordPress MU hosted by WordPress.com, so we got this update last week. We run standard WordPress.org on all of our personal blogs though. We recommend it because it’s free, has a massive userbase, and if you host it yourself, you can do whatever you want with it.

To make the upgrade process as simple as possible (and for the sheer rush of ‘rm -rf’), we use a one line command.

$ curl http://wordpress.org/latest.zip -o "wp.zip" && unzip wp.zip && rm -rf ./wordpress/wp-content/ && cp -r ./wordpress/* ~/www/

curl downloads the latest version from wordpress. unzip unpacks all of the files into a directory called ‘wordpress’. rm -rf removes everything in the ‘wp-content’ directory. Otherwise, you will overwrite your images, themes, and plugins. cp -r copies everything to your http document root, overwriting the previous install.

Naturally, you should back up your current install and database beforehand. We tend to use the one-liner with reckless abandon. If you’re wondering about the terseness, it was designed to fit inside the 140 character limit of Twitter.

[Thanks, Chris Finke]

Hack a Day 2: Electric Boogaloo

Well, that was fun… no, not really, but we’re back from the dead like Steve Jobs. We’ve been getting DDoS’d since essentially the first day we originally came back. After killing a 1G connection, we decided to find a different solution. Since the world didn’t end this week, we brought the site back using WordPress.com as the new host. We now return to our regular blog shenanigans. Here’s to another four years of beta!

Five plugins and tips to secure your WordPress blog


How do you protect your own blog from getting hacked? There’s never a foolproof answer, but with some added tools and caution, you can make your website a little safer from getting into harm’s way. Cats Who Code has five plug-ins and tips you can use to protect your WordPress install. Some of the tips are common sense advice that can apply to anything related to technology – such as making backups often and using strong passwords. Others include suggested plugins that can help you verify whether your WordPress install has any security holes, or small tricks to hide the version of WordPress you’re using. Do you have any useful plugins or tricks to share to keep your blog safe from hackers?

[via Digg]