Defcon day 1 – Lost in Translation – Christian Grothoff

Steganography is the art of hiding things in plain sight. When done correctly an observer shouldn’t be able to tell that there is a hidden message as opposed to cryptography where it is obvious that something is hidden. To do this using text you usually need a large piece of source material; say all of the works of Shakespeare. Since these works are known to most people steg can usually be broken using statistical analysis.

Christian’s solution is to use machine translated (MT) texts as the source material. It is hard to make a computer generate consistant semantically and rhetorically correct texts that mimic the original is very difficult. The technique presented here uses MT texts because translation errors are expected and common.

The source text does not even have to be secret for this technique. It begins by running the text through several MT engines, i.e. Babelfish. To increase the number of possible translations each one is then run through another algorithm that creates more permutations using word replacement an other techniques. These texts are then checked sentence by sentence to determine if they are still statistically close to the original translation to make sure the translation appears probable.

At this point the message is encoded using Huffman tree encoding. Once this is complete some post processing error insertion can be applied. This takes advantage of errors that usually appear in MT: misused articles, prepositions and not translating less commonly known words. There’s even the technique “semantic substitution”, here’s an example: translate a word from English (EN) to German (DE) then translate that word to EN and then back to DE if this DE word is a possible translation of the original EN word they’ll use the DE word. This roundabout translation isn’t as clear to statistical analysis as one-to-one substitution.

There are a couple disadvantages to this method of steg: the low bitrate and the fact that you have to transmit the source and the translated text. There are also some attacks to expose this method. If the same sentence appears twice in a text and is translated two different ways it would set off a red flag. Also if the machine mistakes are inconsistent: using the word “foots” in one place and “feets” in the other. If someone developed a large statistical model of all MT systems it would be easy to see that the steg doesn’t fit the mold, but the steg could also use this model to make sure it fits (an arms race).

The website has a generator on it if you want to play around.

6 thoughts on “Defcon day 1 – Lost in Translation – Christian Grothoff

  1. not to offend anyone ;) but “stenography” and “steganography” are two different things.

    “stenography” means a method of writing rapidly (shorthand) or the act or art of or writing in shorthand.

    “steganography” means, as the text mentions, hiding a secret piece of information inside of a seperate piece of information as to conceal the secret information.

  2. You do realize that there is something stegoed into that post, right? RIGHT?

    I mean, you don’t become a popular blogger with sentences like “It is hard to make a computer generate consistant semantically and rhetorically correct texts that mimic the original is very difficult” (Welcome to the department of redundancy department).

    And the post is about stegoing texts into other, not very long texts.

    First one to crack it wins big accolades.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s