This Image Contains A Hidden Audio Track

This image contains a hidden audio track which you’re very familiar with. Well, it used to. We’d bet we messed up the careful encoding that [Chris McKenzie] used to hide data within an image when we resized the original.

He’s using a method called Steganography to hide a message in plain sight. Since digital images use millions of colors, you can mess with that color data just a bit and the eye will not really be able to pick up any difference. Each pixel has had the eight least significant bits swapped out for the data [Chris] is hiding. Since the image uses 24-bit color, the largest possible change (going from 0 to 255) in those bottom eight bits will only result in a color change of about 0.15%. And that’s only for one pixel; in most cases the change will be much less.

He shows his work, both decoding and encoding using Ruby, and even provides a one-liner which lets you playback the audio without downloading anything (just make sure you’ve got all of the dependencies installed). Never gonna give, you, up…

[via Reddit]

60 thoughts on “This Image Contains A Hidden Audio Track

    1. The rar archives found in nefarious pages around the web are typically made by simply concatenating the contents of the encoded JPG with the contents of the encoded RAR. This works because JPG will ignore bytes tacked onto it and RAR will ignore everything before its magic number “Rar!”.

      The technique used here is modifying the image’s data, encoding hidden data in the pixel values, rather than simply sending it in the same stream as the image. This way, the image can be sent by any lossless means and still have the hidden data recoverable.

    2. If I understand correctly, the rar-in-jpeg approach just appends all the rar data after the jpeg data. Because of the way the rar and jpeg formats work, the jpeg data is ignored as junk by rar extractor, and the rar data is ignored as junk by the jpeg decoder.

      In this case, though, the audio data is actually part of the image itself (specifically, the last 8 bits of each pixel).

  1. But can you detect it? I’m betting yes. There may be an expected value for that bit that you are changing, statistically, based on the surrounding pixels. Also, given that there is a header scheme, that may be the easiest place to “scan” for encoded messages on the web. Better perhaps to do a plain sight (some riddle on the image) for a encrypted message that is hidden beneath a not-so-well hidden message. Not that this is your aim, but it is hard to hide things with definitive value in plain sight when a fast algorithm can be run on many images to see if it isn’t quite like the rest. Just a thought.

    1. This is why you should cipher the data first in a way that makes it look appropriately random. The closest you can get to a properly generated One Time Pad the more random it’s going to look.

      Once it looks random enough, you’ll be looking at something that looks just like “natural” noise (eg camera’s CMOS sensor noise)

      1. The only giveaway then is the fact that you have to save it as a PNG file, because it’s lossless, and people will wonder if you’re an idiot or just deliberately obtuse to not use a proper format for a photograph.

  2. “Since the image uses 24-bit color, the largest possible change (going from 0 to 255) in those bottom eight bits will only result in a color change of about 0.15%.”

    Actually, that’s nonsensical. Each color in a 24 bit image has 8x8x8 bits. You could change the bottom 2 bits out of a RGB+Alpha PNG file and get 8 bits of data per pixel stored, which would result in an error of 1.5% and not 0.15%

    If you only have the RGB channels to work with, two of them must use the bottom 3 bits, preferably not the green one, which gives a total error of 2.6%

    Both cases are clearly visible to the naked eye if you know to look for them, especially on the smooth background bits of the image. If you use only the least significant bits of each color, then you get 0.7% error.

    1. If you really wanted to hide a message in a picture so that it survives even moderate JPEG compression and doesn’t look fishy to a person who’s wondering why a PNG image looks like it has severe JPEG artifacts, is to use a low resolution pattern, something like a QR code, and overlay that on the image so that it affects the least significant bits of each color.

      To decode it, you need to have the original picture, which will then reveal the coded message as the difference of the two images. Without the original, you can’t know what’s in there because the signal is much weaker than the “noise”.

      1. I agree. I didn’t dig into the code to see which bits were being flipped, but if I wrote the code I would make sure it’s the ones that would have the least total effect on the pixel. So the math I used is this:
        (2^8*100)/2^24
        Which gives 0.001526

        Which makes me think it’s actually a 0.0015% change in the worst case. Is this not so?

      2. That’s the error in the total amount of bits, but the brightness values of each color are still only 8 bits wide, so two bits of difference puts your color value 1.5% off the mark.

        And you really have to compare it to the amplitude of the image signal. Suppose the picture at a certain point has an area of blue color at a value of 145,0,0 and you add two bits of error in each color. That’s going to show up really badly.

  3. Like the author said, the answer it the likelihood that it is now in common practice… The problem is, however, that it took effort and access to the images in days past, unlike in the current connected digital world in which nearly all images can be scanned and are accessible with ease and little effort. This changes the game entirely. Furthermore, the simple act of encrypting messages in email evokes suspicion, so that is not a realistic alternative for private message passing unless it was adopted by a sufficient number of email users. Digital “drops” would therefore be much safer for private digital communication if you wanted to pass a message that couldn’t be tied back to an originator or receiver. However, these drops are not hidden, but in plain sight (by analogy). So the real issue is whether this kind of steganography can be accomplished without leaving a signature. This is further complicated by the question of where the image came from. If it is unique, it may give something away by its unique nature. If it is not unique–say some common image on the internet–then you have other images with which to compare–increasing the signal to noise ratio of whatever signature may be present in the altered image. I think it’s very relevant, and I can imagine that it’s being used widely–but most likely for nefarious purposes. I would love to see someone write an open-sourced algorithm for detecting steganographs, test it out, then post results before posting code–but that would only work for detection temporarily. Of course, the alternate use of images to pass a code could be to simply use “drops” that are URLs, passing an image, which in turn is reference to a “one-time-key.” In this way, two people could pass messages to each other in plain sight without access to the same physical location–but a virtual one. The one-time-key would have to be memorized and destroyed to have full plausible deniability. In this case, no one would ever know except the message passers. It would require ip correlation (which a careful communicator would spoof anyway) and knowledge of the drop site… any algorithm doing this would cause more false positives than negatives and the O(m,n)complexity for the problem would be sufficiently huge such that there would be no point in attempting it. This, of course, requires proper prior communication, which in of itself is not a deal-breaker. Orson Scott Card does a good job of explaining both throughout his Ender’s Shadow books. The documentation for TrueCrypt is also an interesting place to read about these concerns. Pardon my extensive post. The “so what?” comment got me going. In a world (not the whole world) where digital communication is essentially required to survive, and within which our privacy rights are nearly non-existent, I would say this is a big deal. It’s too bad that the few that are employing such techniques are likely not doing so for good purposes. How nice it would be to find a cache of love letter stashed away in the steganography! BTW, it is a cute kitten ;) And I promise to no longer spam this post.

      1. To quote my earlier comment “I think it’s very relevant, and I can imagine that it’s being used widely–but most likely for nefarious purposes.” I believe that it is plausible that image sharing sites are being used for just such purposes. The question, of course, is how often and by whom?

  4. Wouldn’t it increase the size of the picture file? I think that would be an obvious tell if that were the case, I suppose a hi res file could be used, but the extra data might be easier to see in the image. what would be required to hide say a 6 gig disk image file?

    1. No, because they’re not adding data.

      In JPEGs each pixel is store as 24bits, 8 bits per pixel, but they aren’t just stored like:

      RED:8
      Green:8
      Blue:8

      but instead it could be something like
      R1:3
      G1:3
      B1:3
      R2:2
      G2:3
      B2:2
      R3:3
      G3:2
      B3:3

      The way it’s split isn’t realistic, but I am simplify. When the jpeg image is read the put R1R2R3 together, G1G2G3, etc. What he is doing is replacing R3G3B3 with one byte (3+2+3 = 8 bits) of the song. The red colour is now R1R2, green is G1G2 and B1B2.

      Since those are the LSBs, you aren’t changing its value too drastically — Only +-7/256 for red, +-3/256 for green, and +-7/256 for blue.

        1. The size of the image doesn’t matter as much as the resolution. How much you can store depends on how much distortion to the image you can tolerate. More data = more distortion.

    2. Yes, it does increase the file size because the PNG format cannot deal with what appears as noise to the compression algorithm. An otherwise small file can suddenly become megabytes.

      This method of storing data in a file cannot survive a JPEG compression, because it throws away the high frequency signals in the least significant bits and subsamples the colors, so the data gets scrambled.

  5. Here’s a question: won’t JPEG lossy compression, no matter how high the quality is, still corrupt the data within? But then again since this is an audio file, lossy compression won’t affect it too much. But I guess one couldn’t use this method to store things that can’t be stored in a lossy manner (at least, not with JPG)

    1. @Mohammed,

      You can apply the steganography to the image after it’s been compressed.

      Once it’s compressed, it isn’t generally further compressed using a lossy algorithm. It might be gzipped into a file or through a stream, but that’s a lossless transmission trick.

      But it all depends, of course. Someone could try transforming the image further, and corrupt the data without knowing it.

  6. As far as i remember, JPEG aren’t exactly encoded as byte(s) per pixel equivalent, they are more like a transformation of chunks of bits ( Discrete Cosine Transform? ) so, the least significative bit doesen’t mean “less color weight” in the visualization. isn’t not a very practic method, because a lot of pages resize your pics, or add watermarks that corrupts your original hidden file!

    1. Jpeg works in 8×8 pixel blocks. It calculates the DC offset of each block and saves this as a low resolution version of the image.

      Then it does a discrete cosine transformation to the remaining signal that is quantisized according to a compression table that tells it what information to toss out. That results in an approximation of the waveform, much like what MP3 does to sound. Then it uses a lossless compression algorithm to wrap things up. At decoding, you decompress the file and reverse the DCT to get the smaller details to add on top of the low-resolution version.

      If you want to add information to the JPEG file this way, you have to fiddle with the DCT data so that it would get reversed into the exact pixels you want. That’s problematic because the reverse process is an approximation as well as the decoder tries to interpret what the original signal was.

      It would probably result in tremendously large JPEG files as well because the tampered DCT data doesn’t compress well anymore, making it look suspicious when a small image of a kitten weighs megabytes.

  7. Awesome work! Really it’s great to see this kind of stuff still interests people. It’s even better to see the topic on such a popular site. Recalling my early years on the internet, I used to do this kind of thing all the time. Every time I see something like this I get excited. To those of you interested there’s a few great websites you could find to learn more about this such as 356* or you could use google if you’d rather. Btw with all the new technology and redundant data I’m sure there are millions of new possible ways to stego messages ect. all over the place. A great experience I once had was when some one encoded both a program, a music file and source code into one image and the goal was to come to one word. True, seams like a lot of work. Very few people see the point in a good stego anymore when there’s plenty of great encryptions around to keep your stuff secret. If you want to that is. Really the only thing I see stego any more as amusement only. Lets be honest here, which is more useful, this or tripleDES? but bias aside this is really great. hope to see more stuff like this on hackaday. Happy hacking. Cheers!

    the above paragraph has a stego’ed message sorry for the rant.

  8. I thought about doing this a few years ago for sending encrypted data/messages to friends. I was thinking that it would only work with gif non lossy compression formats but I was thinking that lossy formats wouldn’t really work.

  9. this has inspired me to make a script monkey script to decode images encoded in this fashion (encrypted data still to come)
    its mostly just a translation of the ruby script he uses into javascript with some tags added to the pages and context menu added to images so you can attempt to decode any image encoded in this fashion with just a right click (the decoded file is downloaded by adding an iframe to the page with a data uri of the content of the decoded file, this prompts a save as dialogue unless you have disabled that then it just saves to the default location)

    still working on a javascript based encoder (should be just doing the same thing backwards)

    1. as an update i found his method to have some flaws (if you encoded data that was much smaller than the image you got a greenish box that was very visible) the encoder btw went swimmingly.
      i changed his method a bit and used a PRNG to put the data at pseudo-random pixels based on a hash of a password. the resulting image is indistinguishable from its original except through a dif (it does seem to affect the file size though, i think it has to do with the compression algorithm, the minor psudo-randomly changed bit must break the compressions a bit. the difference is minor though)
      overall it’s been very fun, im happy for this article.

  10. I recently uploaded a suspicious jpg to VirusTotal and received a response stating that the image contained 25% mp3 audio in it. Would that be indicating the image had something similar to what you’ve shown here??

Leave a Reply to Twilight SparkleCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.