Steganography In Xkcd Comics Without The Img Alt Tag

Inspired by a recent Hackaday post [austin] decided to try his hand at steganography. Steganography, or ‘concealed writing’ has come a long way from ancient Greek slaves/couriers shaving their head, tattooing a message on their scalp, and regrowing their hair. We recently saw a music file masquerading as a picture of a kitten, but that method of hiding data required running a Ruby script. [austin] thought steganography would be a great way to hone his JavaScript skills, so he made an image encoder and decoder purely in JS and HTML.

Like the previous incarnation, [austin]’s work takes a regular .PNG image file and hides stuff in the pixel data. A few of the lower bits for each pixel are modified (three bits from the red and blue, two bits from the green – a good choice, the human eye is very sensitive to green) and a file is embedded inside the .PNG image.

For an example, [austin] embedded some stuff inside the xkcd comic underneath this post’s title. Even though the image is mostly white, we can’t see anything wrong with the colors. If you’d like to decode the message, [austin] put his encoder and decoder up on github. Feel free to take a shot at it.

28 thoughts on “Steganography In Xkcd Comics Without The Img Alt Tag

  1. Is there something that prevents using the Alt-text tags in HTML after doing the stego conversion?

    I was under the impression alt text had nothing to do with the image itself and was just an HTML feature.

    1. You’re right, the image has nothing to do with the HTML. The title of this submission is intentionally misleading as XKCD has absolutely nothing to do with this project, except that Tus used an XKCD image as demonstration of what he did. Brian’s analogy that hiding data in XKCD images is somehow comparable to hiding text in HTML IMG ALT properties is laughable.

        1. I’m sorry, Brian, I’m allergic to using this kind of references to attract readers. I think I may have overdosed on the Internet… After following this blog for a while, I understand that it’s not common to do that here, but I felt like I had to say something about it. Sorry if I sounded like a dick, keep up the good work!

  2. Should be noted that it is very easy to detect stego in drawings (like the XKCD), because it is often obvious and easy to detect that some pixels should be white, but they aren’t. Yes, the password does encrypt the data and this is an amazingly simple, working and praise-worthy example made by Tuseroni, but if anybody wants to seriously use steganography, they should document themselves on what images should be used and how secure and detectable it is (Wikipedia si a good place to start).

    Wonderful job, Tus!

    1. You could probably hide the data for the picture itself inside the picture. It’s black and white — one bit per pixel. And you’re taking eight bits per pixel to hide your data. Hell, you could maybe even store a LARGER version of the same image.

      1. as an example, in one case i encoded a text file into an image, put the image, the encode, the decoder, and a text file with the password to that image in a zip, encoded THAT into another image.

        it has to do largely with how many pixels are in the cover image. so if an image had 1024×768 resolution it could hold (1024*768) bytes or 786,432 ~786 kilobytes the upper bounds for storing data has to do with how many pixels the image has.

        encoding in audio should allow for bigger files, video even bigger still (but i dont know if i can do video since there doesnt seem to be an api for writing to video tags like there is for audio, the processing of video would be simpler though because it would be just like processing for image but a lot more of them.)

    2. it’s just bytes. You could hide anything. More realistically, if you were trying to be sneaky, you wouldn’t hide the actual data there. You’d hide a URL (probably a shortened one) to the actual data you wanted to hide (which would of course be encrypted).

      Example, a typical “bit.ly” shortened URL is http://bit.ly/xxxxxxxx where the x’s represent a hash used to reference the actual URL. If yo wanted to be sneaky, your algorithm could specify that it’s always a bit.ly URL, so you could leave off the “http://bit.ly/” part. Now, you’re only having to store 6 or so bytes of data across the entire image, plus whatever header data you need to identify the image has having your stuff hidden in it and what version of hiding system you used. Figure maybe 10-12 bytes of data would do it.

      Of course, you wouldn’t use bit.ly. You’d make and host your own shortner — and it need not be a short domain name because that’s not stored in the image. You could even ROTATE the domain name regularly so that only people getting updates on a regular basis would be able to have their software follow the path to the actual hidden data.

  3. Sometimes, more is learned in the creation of an excellent hack such as this, than is first appreciated. The same sense of adding a whole realm and it’s derivatives to our Grimoire is to be found by studying how they did it. Points to austin for gifting Hackerdom with a touchstone primer.

    When any of us master replicating someone else’s implementation , the experience points we gain are priceless. If noting else, for the addictive buzz in our Egoboost. It’s that way for me. Might not we call that Egoboo affect alteration a-mindhacking multiplier?

    I’d even go so far as to make it a challenge to us at replicating some featured hacks.. possibly as a cloning to cement the “how-to” in our minds? Or better still- OUR personal best at going further on their path..

    Now I’ve got to block out some time for replicating how this Hack is best applied in my uses for it:}

  4. I love the way he’s implemented with Javascript, because it means you could have a very simple plugin for most browers that would overlay the hidden message across the image as you viewed websites. You could even have a javascript file hosted on a secured server somewhere and reference it with resource tag on the web page, so that only users with proper access (a token cookie or a certificate, for example) would load the javascript to reveal the text, while everyone else would get an essentially empty javascript file.

    This would be an ideal way to have a “hidden” web underneath the current web without necessarily having to hide the servers. I think if you were to do it for real, you’d want to obscure things a bit more, like having the identifier that indicates context exists (and where it starts, and which algorithm to use for decryption) be located at an offset byte computed with a salted hash of the image itself, so that without the salt you couldn’t even programatically identify which images on the net contained the hidden content.

    @M4CGYV3R – I think the reference to alt text is simply a way of saying that he doesn’t mean the pseudo-steganographic (oh yeah, I made up that word) way he uses alt tags to add to the humor or message in his comics. He’s saying this is “real” steganography.

  5. This is how it looks like with auto gamma adjusted:

    I realize that true stealth wasn’t the auhor’s intention and it’s interesting as a proof of concept and certainly will fool most occasional unaware viewers easily but not even the simplest systematic scan.

    1. As you say, he wasn’t looking to be super secret with it. As a proof of concept it’s awesome. If you were doing it in a way you wanted real secrecy, you’d likely use far more complex images, and your algorithm would be setup to skip “pure” color pixels (lets say any of the named colors in html) — especially white and black. You probably also want to avoid some ranges common in flesh tones.

      1. The error is least noticeable in pure white and black because displays tend to “clip” the black end, and the relative error to a full white is small.

        The signal is most noticeable in the midtones, where it looks like extra noise that shouldn’t be there.

    2. hmm interesting. however there is one saving grace, they do still have to get the password, and they have to use the same random number generator…but i always figured if they found the data it wouldnt be much to figure out the order just from looking for the header of the underlying data (which they could probably just try every header there’s not that many)

      but you are correct there is clearly more work to be done here

  6. I’ve been working on something like this for a client, that uses the canvas element to pull stego’d js from an image, bypassing the need to compromise the script scanner on the target device.

    1. hmm curious what the reason is for that.
      it would allow you to put a JS file on someones server but you couldnt link straight to it, nor would you be able to use it without having a javascript script running on the page to decode it from the image also the canvas cannot read images cross origin without permission from the server.

      however you could make a polyglot gif that is simultaneously a gif and a js file, that gif could then be directly linked to and used with
      (http://en.wikipedia.org/wiki/Polyglot_%28computing%29)

  7. It made me laugh when I missed that the data was also compressed, probably the default for a Belnder3D file. I really do love the UNIX “file” command and it’s associated magic library.

    $ file vziYDD7k.part
    vziYDD7k.part: gzip compressed data, from NTFS filesystem (NT)
    $ zcat vziYDD7k.part > vziYDD7k
    $ file vziYDD7k
    vziYDD7k: Blender3D, saved as 32-bits little endian with version 2.52.0005
    $

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.