Detecting ASCII art across the Internet

As a web developer and designer, [Victor] has a habit of putting a very nice ASCII signature in an HTML comment at the top of every web page he designs. He was inspired by seeing others do this,  and this piqued his curiosity to see who else was doing this. His idea was to scan through a chunk of the Internet and see what other web pages had ASCII signatures in an HTML comment. With a lot of very clever work, [Victor] managed to grab some interesting ASCII art that would have been missed without looking at the source of millions of web pages.

After gathering a list of the top million top-level domains from Alexa, [Victor] wrote a script to download the HTML for all the pages in parallel. After that, it was just an issue of detecting the ASCII art in all the HTML files. There were a few earlier ASCII art detection algorithms, but nothing that suited [Victor]‘s use case. The best result came from only looking at the first comment (otherwise the signatory wouldn’t want you to find it with a quick glance at the source) that were at least 3 lines long and 40 characters wide. After discarding everything with HTML tags in it, [Victor] had an awesome gallery of the ASCII art from webpages all around the Internet.

What did he find? Well, there’s far too many ASCII signatures for [Victor] to put up on his webpage, but he did provide a nice sample of what he found. They’re mostly logos, although there is a Hypnotoad and Aperture Science sentry turret in there.

If you’d like to try out [Victor]‘s script, he made everything available on GitHub.