SHOUT For Smaller QR Codes

QR codes have been with us for a long time now, and after passing through their Gardenesque hype cycle of inappropriate usage, have now settled down to be an important and ubiquitous part of life. If you have ever made a QR code you’ll know all about trying to generate the most compact and easily-scannable one you can, and for that [Terence Eden] is here with an interesting quirk. Upper-case text produces smaller codes than lower-case.

His post takes us on a journey into the encoding of QR codes, not in terms of their optical pattern generation, but instead the bit stream they contain. There are different modes to denote different types of payload, and in his two examples of the same URL in upper- and lower- cases, the modes are different. Upper-case is encoded as alphanumeric, while lower-case, seemingly though also containing alphanumeric information, is encoded as bytes.

To understand why, it’s necessary to consider the QR codes’ need for efficiency, which led its designers to reduce their character set as far as possible and only define uppercase letters in their alphanumeric set. The upper-case payload is thus encoded using less bits per character than the lower-case one, which is encoded as 8-bit bytes. A satisfying explanation for a puzzle in plain sight.

Hungry for more QR hackery? This one contains more than one payload!

Unicode: On Building The One Character Set To Rule Them All

Most readers will have at least some passing familiarity with the terms ‘Unicode’ and ‘UTF-8’, but what is really behind them? At their core they refer to character encoding schemes, also known as character sets. This is a concept which dates back to far beyond the era of electronic computers, to the dawn of the optical telegraph and its predecessors. As far back as the 18th century there was a need to transmit information rapidly across large distances, which was accomplished using so-called telegraph codes. These encoded information using optical, electrical and other means.

During the hundreds of years since the invention of the first telegraph code, there was no real effort to establish international standardization of such encoding schemes, with even the first decades of the era of teleprinters and home computers bringing little change there. Even as EBCDIC (IBM’s 8-bit character encoding demonstrated in the punch card above) and finally ASCII made some headway, the need to encode a growing collection of different characters without having to spend ridiculous amounts of storage on this was held back by elegant solutions.

Development of Unicode began during the late 1980s, when the increasing exchange of digital information across the world made the need for a singular encoding system more urgent than before. These days Unicode allows us to not only use a single encoding scheme for everything from basic English text to Traditional Chinese, Vietnamese, and even Mayan, but also small pictographs called ‘emoji‘, from Japanese ‘e’ (絵) and ‘moji’ (文字), literally ‘picture word’.

Continue reading “Unicode: On Building The One Character Set To Rule Them All”