Steganography involves hiding data in something else — for example, encoding data in a picture. [David Buchanan] used polyglot files not to hide data, but to send a large amount of data in a single Twitter post. We don’t think it quite qualifies as steganography because the image has a giant red UNZIP ME printed across it. But without it, you might not think to run a JPG image through your unzip program. If you did, though, you’d wind up with a bunch of RAR files that you could unrar and get the complete works of the Immortal Bard in a single Tweet. You can also find the source code — where else — on Twitter as another image.
What’s a polyglot file? Jpeg images have an ICC (International Color Consortium) section that defines color profiles. While Twitter strips a lot of things out of images, it doesn’t take out the ICC section. However, the ICC section can contain almost anything that fits in 64 kB up to a limit of 16 MB total.
The ZIP format is also very flexible. The pointer to the central directory is at the end of the file. Since that pointer can point anywhere, it is trivial to create a zip file with extraneous data just about anywhere in the file.
So the scheme is to break up the payload into 64 kB or smaller chunks using rar. The replace the image’s ICC section with a zip of the rar files and point to the zip directory at the end of the file. A program reading image data will just see a garbage ICC section and some extra bytes at the end. A zip program will find the zip file and extract the rar files.
Interestingly, even creating a thumbnail usually keeps the color profile data, so a Twitter thumbnail will still retain the payload. Of course, any service that strips out the ICC data will break the method. Some do and some don’t, and — of course — anyone could start removing the data at any time.