A Super Speedy Lightweight Lossless Compression Algorithm

[Dominic Szablewski] was tinkering around with compressing RGB images, when he stumbled upon idea of how to make a simple lossless compression algorithm, resulting in the Quite OK Image Format, which seems to offer comparable file sizes to the PNG format but is so simple it runs up to 50 times faster for compression and up to four times faster for decompression. Implementation can be achieved with a miniscule 300 lines of C. Need a bit more detail on the real-world performance? Well [Dominic] has that covered too, with a complete set of benchmarks for your perusal.

Image formats are one of those things these days that are designed by consortium, with so much complexity wedged in making it hard to implement with limited resources, so we find it very refreshing to see someone going back to basics and producing something super lightweight, and plenty good enough in practical terms.

Other uses for the algorithm could be for super simple video compression, for applications where resource is tight and some low-effort bandwidth reduction would be beneficial. Implementation in a small FPGA would also be quite straightforward, since the memory requirement is quite low also.

The project is very new and already there are some tweaks happening to the format, so the GitHub project code may change without warning to reflect any corrections [Dominic] feels necessary.

Thanks [David] for the tip!

38 thoughts on “A Super Speedy Lightweight Lossless Compression Algorithm

    1. No, it’s RLE + predictive pixel estimation based on previous pixels (à la png). No entropy coding, no Huffman, no Lempel–Ziv–Welch here. Not sure the performance are as good as claimed on many different images, though. Might be fast, and probably ok for a kind of artist generated picture, but for real life images, not so well.

        1. Tested with several rendered and real scenes. Usually saw about 15-50% space savings. With easier-to-compress-images it actually did fair fairly well. Usually about 1.5x bigger than PNGs, but hey, if you have easy-to-compress-images you can get 25:1 compressions. Though with random noise, the file size increased by 25%!?

      1. Technically the variable size encoding of differences is a crude entropy coding (similar to Huffman with fixed tables). Smooth images (low local variance) should encode well, but grainy images (high local variance) are not going to compress as well with this scheme.
        On, the plus side, it could improve compression with better predictors (PNG/JPEG-LS like, not just horizontal difference).

  1. Well, while working on CD-i titles in the 90’s, I came up with a quite similar compression algorithm. I am thinking that many developers did the same in the ’90’s.

    But that algorithm was later abandoned in favor of JPEG, because it could make even smaller files, and decompression was fast as well. JPEG was lossy, but the resolution of the images was 384×280 pixels (although there was a QHY mode of 768×560 interlaced, but we didn’t use that) and displayed on a (CRT) TV. So you would hardly see the artifacts, apart from that people were simply wowed anyway and didn’t even notice them.

    So, we used JPEG for a while, even after CD-i was dead in the water. Then started making similar titles for PC with CD-ROM. But the resolution of the screens were higher, and computer monitors have more sharpness to them as well. So we had to compress our images with a lower compression rate to make it look good on 1024×768 and the likes. The size of the files grew, and we had to find something better again.

    By that time, the way to go was to use both .JPG and .BMP, whichever was the best for its purpose. And both were free on Microsoft Windows platforms. Later there was one more addition: .PNG.

    By that time computers were so powerful, and storage was so huge, that we never thought about it anymore.

    So, basically, the cause of the complexity of current graphics file formats is that simply nobody cares anymore about less complexity. Over the years, features of the image formats have become much more important, and indeed copy protection is an important thing as well. Someone has the copyrights of the image, and that person has the right to earn money with his images and has the right to protection against stealing of his images.

    So, I would say: great work, your algorithm looks fine, and your code looks nice enough. Although I can’t understand why people would still use abbreviations for variable names, like ‘px’, ‘p, ‘desc’, ‘b1’, ‘b2’, etc, instead of descriptive variables. We live in 2021, not in 1985 where variable names were limited to 8 characters or so.

    If you want to speed it up even more, use compiler hints like ‘register’. If you really want to speed it up, write the code in assembly. Although I think that hand-optimizing on CPUs with long pipelines and instruction/data caches (ARM cpus) is probably going to be a waste of time. The compiler can do that better. CD-i used a 68000 CPU, and writing and optimizing the decoder in assembly was easy. ;)

    1. The new way to optimize in ‘assembly’ is using SIMD intrinsics. It’s not too far off from assembly even though it is inline in your high level code. And yes it is often faster than a compiler with it’s long pipelines, branch prediction etc. On the ARM this would amount to writing it using NEON.

      1. Never dabbled in that. As usual, simply because I never had the need. The field of computer science has grown so wide now that it’s impossible to learn everything. All you can do is try to keep track, so that you know that something exists and know where to find it when you need it.

        But I would guess that on an ARM (seems that NEON is supported from Cortex-A5 and up), it’s already quite trivial to use JPEG/PNG or QOI. Apart from that JPEG needs to be licensed.

        I would think that QOI is more suitable for processors with fewer resources: AVR, PIC, etc.

    2. I agree that countless simple and effective image compression schemes have been made, often with almost the same obvious approach in this article but these have never been popular because they are in some proprietary format, in some proprietary game, or demo or project and either never released, or just sitting in some dusty corner of github. I don’t think this guy searched too too hard.

      1. My point, which I admit that I didn’t make it clear too much, was that those algorithms were never made popular because they basically became extinct before they could prosper.

        And in halfway the ’90’s, internet was not so common yet, and there was no github.

        Basically I’m sure that I still must have one or two floppies with some of my algorithms on them. But there are a few problems: I have to find those floppies, and then I have to find a floppy drive, and then I have to find a computer that can support a floppy drive. :+)

        My point was that increasing computer power rendered those algorithms obsolete for a while. Long enough for them to get lost in the sands of time. Now we are seeing another revival of small and not so powerful cpu’s, and the loss of those old algorithms means that people have to reinvent the wheels.

        I am not saying there is no merit! The algorithms are fine, and it’s great that Dominic is trying to standardise it and make it popular under the name ‘QOI’. Although I see that he already had to change the file-format and broke backwards-compatibility. Not the thing to do if you have a ‘standard’, and for sure one of the reasons why JPEG and PNG bloated: they have to stay backwards compatible with every old version. Photo’s from my 22-year old Sony DSC-F55 can still be decoded by the latest JPRG library.

        Maybe Dominic’s greatest merit with this algorithm will be that he will embolden other developers to consider writing code themselves over preferring some standard bloated library. But if he makes it a standard: beware of the bloat yourself! :)

    3. A follow-up by me, because I noticed my spidey sense tingling, but needed some time to realise why. :)

      The function qoi_decode has one big drawback. Inside, it allocates a memory buffer, fills it, and returns it. These are bad practices. Returning the buffer means a transfer of ownership (i.e. somebody else must delete it), and you should avoid those as much as possible.

      And it’s easy to avoid it: just let the caller pass the buffer as a pointer to a pointer, and then fill it.

      Another thing is that the function allocates the whole buffer for the image in one go. Quite big memory allocation. If you need to save on memory, why not have qoi_decode decode single lines of image, instead of the whole image at once?

    4. >Someone has the copyrights of the image


      >and that person has the right to earn money with his images


      >and has the right to protection against stealing of his images.


      That’s not what copyrights are or how they work, or what “theft” is in legal terms. Most importantly, copyrights are NOT a right to profit – they’re copyrights: exclusive rights over distribution and continued ownership after a sale. This has the side effect of making profit, since you never actually sell anyone anything – you just take the money.

    5. The size of variable names should be chosen wisely, and long descriptive names are not always appropriate. A bunch of nested loop counters used 40 times in 12 lines should not be 20 characters each because it makes the code unreadable.

  2. Speed is nice, but what really matters on small devices is the amount of extra RAM you need to decompress it. For example. GIF is almost unusable with the 4k of dictionary needed.

    1. This format has a function calling back to prior pixels, but it is only 64 pixels deep. So for RGBA that is only about 2KB. Still a bit big for a small microcontroller. But amply small enough for the somewhat bigger micros.

      Though, most of the times one starts fiddling with graphics beyond a text display then one tends to step up a notch in microcontroller specs. So a few KB of RAM on it wouldn’t be unreasonable to expect, though obviously some of that space will be used for other stuff.

  3. The part in the documentation saying “uint8_t channels; // must be 3 (RGB) or 4 (RGBA)” is a bit sad…

    What if I want to compress a bitmap with only 1 or 2 channel? Usually a monochromatic image with or without alpha. Or if one wants to compress an image with additional channels for other functions. (Monochrome channels are quite common in the 3d rendering scene and a quick to en/decode format like this could keep more textures in VRAM since only the ones actively used could be uncompressed. This quicker decoding would even be useful for the idea where the GPU directly loading in files from storage without bothering the CPU/OS about it as much.)

    The “must be 3 or 4” is however not directly an arbitrary limit of the format. Since it handles whole pixels and not the channels individually. So the RGB and RGBA encodings are actually different.

    I can likewise also see that the format is catering only to 8 bit per channel. 8 bit color has some banding issues in a lot of applications, supporting 12-16 bit would greatly reduce such issues when used. (Some people might though say that one can’t discern the 17 million colors, but it is fairly trivial to see the boarder between two colors that are just one away from each other.)

    But it is still a somewhat nice format and seems fairly simple.

    1. this is related to its big weakness. apart from RLE, each pixel must take at least one byte. so if your incoming pixels are 24bpp then it’s capable of at best compressing to 33% of the original size. any smaller than 24bpp, it won’t seem much like compression at all unless your source image is extremely susceptible to RLE.

      looking at some JPEGs I have sitting around, it looks like it’s not uncommon for them to compress to 2 or 3 *bits* per pixel. of course, it’s competing with PNG…and PNG can compress 8bpp or 1bpp images to less than 1 byte per pixel, so that’s one little win for PNG.

      1. Actually, it can do better since it can repeat pixels X times.
        So if you have a field with a plain color in it, it will explain the whole line of same colored pixels in 2 bytes. (unless it is thousands of pixels wide)

        For a lot of simpler graphics, this can reduce the file size by a lot.

    1. webp has terrible generation loss, since it basically encodes the color information at half the resolution of the image regardless of your compression setting. It’s designed for destroying images for any further purposes.

      1. So it is. Ugh. I hate that C coding style (In my C world; definitions.h implementation.c) but hate more that I didn’t bother to read the whole header file. What an amateur. Will try harder.

        I shall correct the article when I get a sec.

        Hi Matt! BTW

        1. Hello :-)
          Yes its not pretty but presumably solved some problem the author had.
          If you think that’s bad try the google testing & mocking frameworks! 1000s of lines of template meta-programming magic which has to be header only.

  4. I quite like the logical approach to the compression, although it could be argued, the compression could be tighter, all be it at the expense of some speed. Perhaps applying the algorithm in two dimensions, [ or three for video, a bit like a gif ], I wonder how this compares with just applying 7zip to say a bitmap.

  5. to improve the algorithm do an rgb->lab conversion first, and encode that, sure you need to adjust the encoding algorithm a bit, because the lab’s a and b channel is behaving much more differently

  6. Congratulations for getting this far! The basics can be quite simple…

    But then there is feature creep:
    – More options for color channels (and maybe add a color channel decorrelation?)
    – support different bit depths (Messing up the hand-crafted encodings)
    – are the images / encoding decisions optimum also for other datasets?
    – utilize multi-core CPUs, (like for 8k 120fps video) by splitting the pixels with a wavelet decomposition
    – can you do visually lossless? Fixed Bitrate encoding?

    And then you end up with a committee designing JPEG XS or such

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.