HOPE 2008: Wikiscanner 2.0


[Virgil] presented the next version of Wikiscanner at The Last HOPE today. To build the original Wikiscanner, he scanned the monthly database dump of anonymous edits and compared that against a purchased list of known company IP addresses. The 34.5 million edits account for nearly 21% of all edits. The idea was to unearth businesses and groups white washing critical pages. This only handles anonymous edits though. Users could log in to avoid having their IP reversed.

In the new version, [Virgil]’s team developed a Poor Man’s CheckUser. If you spend too much time editing a talk page, your session could end and when you hit save it attaches your IP. Most regular users will then log in and remove their IP. They found 13,000 username/IP address pairs by searching for IPs being removed and replaced with usernames. These are some of the most active users. Using this list, they could potentially uncover sockpuppets or potential collusion by top editors.

Another update to the Wikiscanner is based on the trademark database. In 2.0, companies are now associated with edits to their respective products. Link distance is also taken into account. So, pages that link to a corporation page are also tracked.

As part of a joke, [Virgil] compared a list of IP addresses that were specific to each MIT building. You can see exactly which building was editing the “tentacle rape” page. No, really, that’s a real example.

Finally, [Virgil] showed some wiki tools that others have built: a tool for building graphs from arbitrary data, a tool that shows the age of text for credibility, and another that does an overview of all page edits.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.