In a world with finite storage and an infinite need for more storage space, data compression becomes a very necessary problem. Several algorithms for data compression may be more familiar – Huffman coding, LZW compression – and some a bit more arcane.
Steganography refers to the method of concealing messages or files within another file, coming from the Greek words steganos for “covered or concealed” and graphe for “writing”. The practice has been around for ages, from writing in invisible ink to storing messages in moon cakes. The methods used range from hiding messages in images to evade censorship to hiding viruses in files to cause mayhem.
The developer explains that since every file is just a bit sequence, observing files leads to the realization that a majority of bits will be equal on the same places. Rather than storing all of the bits of a file, making modifications to the hard drive at certain locations can save storage space. What is important to avoid, however, is lossy file compression that can wreak havoc on quality during the compression stage.
The compression technique they ended up implementing is based on the F5 algorithm that embeds binary data into JPEG files to reduce total space in the memory. The compression uses libjpeg for JPEG decoding and encoding, pcre for POSIX regular expressions support, and tinydir for platform-independent filesystem traversal. One of the major modifications was to save computation resources by disabling a password-based permutative straddling that uniformly spreads data among multiple files.
One caveat – changing even one bit of the compressed file could lead to total corruption of all of the data stored, so use with caution!
Some sentences have more than meets the eye, and we’re not talking about interpretive nonsense. Rather, some sentences may contain up to four paragraphs’ worth of hidden text, invisible to readers.
Thanks to Zero Width Obfuscation, it is possible to use Zero Width Characters – Unicode characters that are invisible even when you try to highlight them. They’re typically used for abstract foreign languages that require separators that don’t take up an entire space. In this case, they’re used to obfuscate and de-obfuscate hidden messages sent through text.
[inzerosight] published a browser extension that identifies, de-obfuscates, and obfuscates these messages for you on the web. It does this by querying each page for the Unicode of the Zero Width Characters (U+FEFF, U+200C, U+200D, U+200E, U+2060, U+180E) and highlighting where they’ve been spotted. The encoding replaces each Unicode character with a permutation of two of the Zero Width Characters, essentially doing a find and replace across the text message.
I’m just waiting to see how long it takes for Zero Width Obfuscation to become the next Konami Code Easter Egg.
In a move guaranteed to send audiophiles recoiling back into their sonically pristine caves, two doctoral students at ETH Zurich have come up with an interesting way to embed information into music. What sounds crazy about this is that they’re hiding data firmly in the audible spectrum from 9.8 kHz to 10 kHz. The question is, does it actually sound crazy? Not to our ears, playback remains surprisingly ok.
You can listen to a clip with and without the data on ETH’s site and see for yourself. As a brief example, here’s twelve seconds of the audio presenting two versions of the same clip. The first riff has no data, and the second riff has the encoded data.
You can probably convince yourself that there’s a difference, but it’s negligible. Even if we use a janky bandpass filter over the 8 kHz -10 kHz range to make the differences stand out, it’s not easy to differentiate what you’re hearing:
We think of Morse code in terms of dots and dashes, but really it’s a kind of binary code. Those symbols might as well be 0s and 1s or any other pair of characters. That attribute is exactly what led to a sting operation a music lyric site called Genius.com pulled on Google. At issue was a case of song lyrics that had allegedly been stolen by the search giant.
Song lyric sites — just like Google — depend on page views to make revenue. The problem is that in a Google search the lyrics appear on the search page, so there is no longer much incentive to continue to the song lyric site. That’s free enterprise for you, right? It is, but there was a problem. It appears that Google — or, according to Google, one of their partners — was simply copying Genius.com’s lyrics. How does Genius know the song lyrics were copied? According to news reports in the Wall Street Journal and other sources, they used Morse code.
Hackers and makers see the desktop 3D printer as something close to a dream come true, a device that enables automated small-scale manufacturing for a few hundred dollars. But it’s not unreasonable to say that most of us are idealists; we see the rise of 3D printing as a positive development because we have positive intentions for the technology. But what of those who would use 3D printers to produce objects of more questionable intent?
We’ve already seen 3D printed credit card skimmers in the wild, and if you have a clear enough picture of a key its been demonstrated that you can print a functional copy. Following this logic, it’s reasonable to conclude that the forensic identification of 3D printed objects could one day become a valuable tool for law enforcement. If a printed credit card skimmer is recovered by authorities, being able to tell how and when it was printed could provide valuable clues as to who put it there.
This precise line of thinking is how the paper “PrinTracker: Fingerprinting 3D Printers using Commodity Scanners” (PDF link) came to be. This research, led by the University at Buffalo, aims to develop a system which would allow investigators to scan a 3D printed object recovered from a crime scene and identify which printer was used to produce it. The document claims that microscopic inconsistencies in the object are distinctive enough that they’re analogous to the human fingerprint.
But like many of you, I had considerable doubts about this proposal when it was recently featured here on Hackaday. Those of us who use 3D printers on a regular basis know how many variables are involved in getting consistent prints, and how introducing even the smallest change can have a huge impact on the final product. The idea that a visual inspection could make any useful identification with all of these parameters in play was exceptionally difficult to believe.
In light of my own doubts, and some of the excellent points brought up by reader comments, I thought a closer examination of the PrinTracker concept was in order. How exactly is this identification system supposed to work? How well does it adapt to the highly dynamic nature of 3D printing? But perhaps most importantly, could these techniques really be trusted in a criminal investigation?
AI today is like a super fast kid going through school whose teachers need to be smarter than if not as quick. In an astonishing turn of events, a (satelite)image-to-(map)image conversion algorithm was found hiding a cheat-sheet of sorts while generating maps to appear as it if had ‘learned’ do the opposite effectively[PDF].
The CycleGAN is a network that excels at learning how to map image transformations such as converting any old photo into one that looks like a Van Gogh or Picasso. Another example would be to be able to take the image of a horse and add stripes to make it look like a zebra. The CycleGAN once trained can do the reverse as well, such as an example of taking a map and convert it into a satellite image. There are a number of ways this can be very useful but it was in this task that an experiment at Google went wrong.
A mapping system started to perform too well and it was found that the system was not only able to regenerate images from maps but also add details like exhaust vents and skylights that would be impossible to predict from just a map. Upon inspection, it was found that the algorithm had learned to satisfy its learning parameters by hiding the image data into the generated map. This was invisible to the naked eye since the data was in the form of small color changes that would only be detected by a machine. How cool is that?!
This is similar to something called an ‘Adversarial Attack‘ where tiny amounts of hidden data in an image or other data-set will cause an AI to produce erroneous output. Small numbers of pixels could cause an AI to interpret a Panda as a Gibbon or the ocean as an open highway. Fortunately there are strategies to thwart such attacks but nothing is perfect.
Steganography involves hiding data in something else — for example, encoding data in a picture. [David Buchanan] used polyglot files not to hide data, but to send a large amount of data in a single Twitter post. We don’t think it quite qualifies as steganography because the image has a giant red UNZIP ME printed across it. But without it, you might not think to run a JPG image through your unzip program. If you did, though, you’d wind up with a bunch of RAR files that you could unrar and get the complete works of the Immortal Bard in a single Tweet. You can also find the source code — where else — on Twitter as another image.
What’s a polyglot file? Jpeg images have an ICC (International Color Consortium) section that defines color profiles. While Twitter strips a lot of things out of images, it doesn’t take out the ICC section. However, the ICC section can contain almost anything that fits in 64 kB up to a limit of 16 MB total.
The ZIP format is also very flexible. The pointer to the central directory is at the end of the file. Since that pointer can point anywhere, it is trivial to create a zip file with extraneous data just about anywhere in the file.