Text Compression

It used to be that memory and storage space were so precious and so limited of a resource that handling nontrivial amounts of text was a serious problem. Text compression was a highly practical application of computing power.

Today it might be a solved problem, but that doesn’t mean it doesn’t attract new or unusual solutions. [Fabrice Bellard] released ts_zip which uses Large Language Models (LLM) to attain text compression ratios higher than any other tool can offer.

LLMs are the technology behind natural language AIs, and applying them in this way seems effective. The tradeoff? Unlike typical compression tools, the lossless decompression part isn’t exactly guaranteed when an LLM is involved. Lossy compression methods are in fact quite useful. JPEG compression, for example, is a good example of discarding data that isn’t readily perceived by humans to make a smaller file, but that isn’t usually applied to text. If you absolutely require lossless compression, [Fabrice] has that covered with NNCP, a neural-network powered lossless data compressor.

Do neural networks and LLMs sound far too serious and complicated for your text compression needs? As long as you don’t mind a mild amount of definitely noticeable data loss, check out [Greg Kennedy]’s Lossy Text Compression which simply, brilliantly, and amusingly uses a thesaurus instead of some fancy algorithms. Yep, it just swaps longer words for shorter ones. Perhaps not the best solution for every need, but between that and [Fabrice]’s brilliant work we’re confident there’s something for everyone who craves some novelty with their text compression.

[Photo by Matthew Henry from Burst]

Hackaday

1 Articles

Text Compression Gets Weirdly Efficient With LLMs

Search

Never miss a hack

If you missed it

Mining And Refining: Uranium And Plutonium

Programming Ada: First Steps On The Desktop

The Hunt For MH370 Goes On With Barnacles As A Lead

MXM: Powerful, Misused, Hackable

VCF East 2024 Was Bigger And Better Than Ever

Our Columns

Welcome Back, Voyager

Hackaday Podcast Episode 268: RF Burns, Wireless Charging Sucks, And Barnacles Grow On Flaperons

This Week In Security: Cisco, Mitel, And AI False Flags

Keebin’ With Kristina: The One With The Transmitting Typewriter

Supercon 2023: Alex Lynd Explores MCUs In Infosec

Search

Never miss a hack

Subscribe

If you missed it

Our Columns