[Ottverse] has an interesting series in progress to demystify video compression. The latest installment promises to explain discrete cosine transforms as though you were five years old.
We’ll be honest. At five, we probably didn’t know how to interpret this sentence:
…the Discrete Cosine Transform takes a set of N correlated (similar) data-points and returns N de-correlated (dis-similar) data-points (coefficients) in such a way that the energy is compacted in only a few of the coefficients M where M << N.
Still, the explanation is pretty clear and we really liked the analogy with the spheres and the stars in a constellation.
The example Matlab code is probably also lost on a five-year-old, but we liked it. Anyone, we think, can understand the practical result where removing too much data — high compression — resulted in a poor quality image, but the image quality was pretty good even when 75% of the data vanishes.
So while you might not want to show this to your five-year-old, you might enjoy it and even learn something. The rest of the series is pretty good, too. There are discussions of data compression, codecs, and encoders. We are sure there’s more to come, also.
If you are not into Matlab, you could probably do the same trick with SciPy pretty easily. Or, try Octave, one of several open-source projects that are similar to Matlab.
Hey guys – author of that blog post here. My colleague submitted it to HN and I got a lot of positive comments and constructive feedback (the way I like to look at it). I guess my core audience got it immediately, but, when exposed to HN, I learned a lot more about explaining it right.
I’ve expanded on the article using a new example of how “20 questions and the kid game “i spy with my little eye”” is in-fact a pretty good approximation of the DCT. Hope this helps everyone interested in dipping their toes into the amazing world of image & video processing, compression, and delivery.
Hope this helps, and thanks for sharing and the summarization!
HN?
Ah, hacker news. Sorry. Always Google first.
I’m afraid I still don’t know what DCT is. I get that it’s used in compression but since I’m not familiar with MatLab, the functional code example in the linked to article reads like:
inputPixels = ones(8,8) * 255;
dctCoeffs = MAGIC(inputPixels);
reconstructedPixels = TADA(dctCoeffs);
Images of the DCT helped me to understand: https://www.youtube.com/watch?v=Q2aEzeMDHMA
DCT is some spiffy linear algebra, short for Discrete Cosine Transform. Transform means it just changes the way the data is represented, but (theoretically) without changing the actual data. If you feed it the graph of a sound wave over a short period of time, with a point for the amplitude for each discrete moment in time (called samples), it will spit out a graph with the intensity of each sound frequency possible in that time slice. You can reverse the transform by taking each frequency intensity and making a sine wave of that amplitude and adding them all together. The useful thing about DCT is that if you remove some of the smaller intensities (called coefficients) and just set them to zero, you still get a pretty close approximation of the original wave. The neat thing is that it extends to any dimensional space you need, like 2D for images.
Simplest version:
You can take a row of pixels and imagine that this row repeats again forever. In this way, you’ve transformed a line of the image into something that resembles a continuous tone. The repeating pixel pattern includes frequencies that modify the tone, so you find what individual frequencies are present in the data and mark down their cosine (sine) wave coefficient: amplitude, frequency, and relative phase.
When you sum all the found cosine (or sine) waves together, it should form the exact same pattern. If you’ve summed enough waves, when you round it down to the resolution and bit-depth of the original image, it should return the exact same pixels back.
The compression happens when you order the coefficients from largest to smallest, and discard the smallest coefficients that have the least effect on the end result. That way you end up with an approximation of the original data.
There’s multiple advantages in representing the picture this way. One is that you don’t have to use the same resolution to draw the picture, because the waves are continuous – so you can freely scale up and down. The other is that you can transmit over noisy channels by sending the largest coefficients with more redundancy, so if the receiver can’t get all of the data, they’re likely to get at least a blurry version of the image instead of nothing at all.
Of course you don’t have to take an individual line. You can put all the lines of the image one after the other and interpret that as the repeating pattern.
Or, like you have in JPEG and MPEG compression, you take 8×8 pixel blocks and calculate the DCT over each of those. That way you retain a low resolution version of the original image you can draw, like a 1/8th thumbnail of the picture, and add detail if needed.
Thanks All. I have a much better understanding now!
I haven’t parsed this article yet, but I put it on my reading list because I always appreciate a different angle on this material.
I’m just posting to promote The Scientist and Engineer’s Guide to Digital Signal Processing, by Steven W Smith (dspguide.com). You can download it for free as a PDF…I first downloaded it back in 1997 and I have worked my way very slowly through different parts of it since. It is very good, in my opinion, if you want a formal introduction to these concepts. The first time, I will say, I didn’t feel like I got a lot out of it, but each time I come back to it the concepts gel a little more thoroughly. Really an incredibly valuable resource.
Now take all you’ve learned about spatial compression and spread it across time… temporal compression.
I had a fascinating conversation with the head geek about 15 years ago) at Snell & Wilcox (UK) about their clever video processing algorithms.
What I learned is that I should have paid more attention in math classes!
I wish there was a series like this that helped understand the inner workings of MP3 audio and explained in simple language (not the theory behind how it does what it does but the complex math involved)