Huffman Coding

This is the first we’ve heard of Parametric Press — a digital magazine with some deep dives into a variety of subjects (such as particle physics, “big data” and such) that have interactive elements or simulations of various types embedded within each story.

The first one that sprung up in our news feed is a piece by [Omar Shehata] on the humble JPEG image format. In it, he explains the how and why of the JPEG encoding process, allowing the reader to play with the various concepts along the way, in real time, within the browser.

RGB colour-space subsampling doesn’t affect each component to the same degree due to the human eye cone cell response. Also, the chroma components are much less affected than the luminance.

For those not familiar with the format, the first step (which is actually optional) to JPEG encoding is to transform the image from the RGB color space, into a YCbCr (luminance, chrominance) color space. Since the human eye is far more sensitive to luminance (brightness) differences than it is to Cb (chroma relative blueness) and Cr (chroma relative redness) differences, these latter two components can be subsampled by only storing a single value for each, in every 2×2 pixel matrix. JPEG allows other matrix sizes, but 2×2 is the most common.

This sets the scene for the clever bit, that comes next and allows more of that harder-to-perceive chroma information to be discarded. It’s fun to play with the chroma sub-sampling slider and see how the different colours are not equally affected, due to the relative sensitivities of the human eye cone cells.

Next, the three YCbCr components are treated independently to a discrete cosine transform and quantization. This transforms each 8×8 pixel block into 64 discrete spatial frequencies. The JPEG compression level (which you can change) affects how many of the upper-frequency components get discarded, and thus how much of the fine spatial detail gets discarded. This is the main source of JPEG image quality loss. Finally, the compressed blocks are delta encoded, where each subsequent block is coded as the difference from the previous one. Like chroma subsampling, this doesn’t offer any compression on its own but allows the subsequent run-length encoding to be more effective, giving more (lossless) compression. Finally, the whole lot is then Huffman compressed with a unique table stored in the JPEG header. So want to play with JPEGs some more? here’s the GitHub source.

If all of this theoretical stuff is a bit useless to you, perhaps you just want to decode some JPEGs, then here is a speedy library for just that.

The team over at NerdKits decided they needed to do something for Halloween. Only on Halloween is scaring small children is an admirable goal, so they demoed a way to play creepy sounds after a door has been opened.

To trigger the sound, a magnetic reed switch from an alarm system is attached to a front door. This triggers the microcontroller and with a bit of delay, some creepy audio can be played on a pair of speakers. The team decided to store all the audio data on the flash memory of their ATmega328p, but that wouldn’t allow for a very long scream. To extend the length of the wails of the damned, the NerdKits team decided to use Huffman coded audio.

Because Huffman coding relies on the most common value being assigned the shortest code, the team used a bit of Python and C magic to figure out the optimal encoding for their audio file. After the evil laugh was sufficiently compressed, the microcontroller was programmed to decode the audio and send it to a pair of speakers. The team made all the software for their project available here for your perusal.

Although this project could be thrown together in an hour with an Arduino and an MP3 shield, the NerdKits team wants to get kids to learn how things work, also an admirable goal. [Humberto] from NerdKits put a video up explaining the theory of the project. Check it out after the break.

Continue reading “Halloween Hacks: Scaring Small Children With Huffman Coding” →