Detecting ASCII art across the Internet

As a web developer and designer, [Victor] has a habit of putting a very nice ASCII signature in an HTML comment at the top of every web page he designs. He was inspired by seeing others do this, ย and this piqued his curiosity to see who else was doing this. His idea was to scan through a chunk of the Internet and see what other web pages had ASCII signaturesย in an HTML comment. With a lot of very clever work, [Victor] managed to grab some interesting ASCII art that would have been missed without looking at the source of millions of web pages.

After gathering a list of the top million top-level domains from Alexa, [Victor] wrote a script to download the HTML for all the pages in parallel. After that, it was just an issue of detecting the ASCII art in all the HTML files. There were a few earlier ASCII art detection algorithms, but nothing that suited [Victor]’s use case. The best result came from only looking at the first comment (otherwise the signatory wouldn’t want you to find it with a quick glance at the source) that were at least 3 lines long and 40 characters wide. After discarding everything with HTML tags in it, [Victor] had an awesome gallery of the ASCII art from webpages all around the Internet.

What did he find? Well, there’s far too many ASCII signatures for [Victor] to put up on his webpage, but he did provide a nice sample of what he found. They’re mostly logos, although there is a Hypnotoad and Aperture Science sentry turret in there.

If you’d like to try out [Victor]’s script, he made everything available on GitHub.

