Building Up Unicode Characters One Bit At A Time

The range of characters that can be represented by Unicode is truly bewildering. If there’s a symbol that was ever used to represent a sound or a concept anywhere in the world, chances are pretty good that you can find it somewhere in Unicode. But can many of us recall the proper keyboard calisthenics needed to call forth a particular character at will? Probably not, which is where this Unicode binary input terminal may offer some relief.

“Surely they can’t be suggesting that entering Unicode characters as a sequence of bytes using toggle switches is somehow easier than looking up the numpad shortcut?” we hear you cry. No, but we suspect that’s hardly [Stephen Holdaway]’s intention with this build. Rather, it seems geared specifically at making the process of keying in Unicode harder, but cooler; after all, it was originally his intention to enter this in last year’s Odd Inputs and Peculiar Peripherals contest. [Stephen] didn’t feel it was quite ready at the time, but now we’ve got a chance to give this project a once-over.

The idea is simple: a bank of eight toggle switches (with LEDs, of course) is used to compose the desired UTF-8 character, which is made up of one to four bytes. Each byte is added to a buffer with a separate “shift/clear” momentary toggle, and eventually sent out over USB with a flick of the “send” toggle. [Stephen] thoughtfully included a tiny LCD screen to keep track of the character being composed, so you know what you’re sending down the line. Behind the handsome brushed aluminum panel, a Pi Pico runs the show, drawing glyphs from an SD card containing 200 MB of True Type Font files.

At the end of the day, it’s tempting to look at this as an attractive but essentially useless project. We beg to differ, though — there’s a lot to learn about Unicode, and [Stephen] certainly knocked that off his bucket list with this build. There’s also something wonderfully tactile about this interface, and we’d imagine that composing each codepoint is pretty illustrative of how UTF-8 is organized. Sounds like an all-around win to us.

33 thoughts on “Building Up Unicode Characters One Bit At A Time

  1. “If there’s a symbol that was ever used to represent a sound or a concept anywhere in the world, chances are pretty good that you can find it somewhere in Unicode.”

    Does anyone know about the utf-8 code for cuneiform 20? I found U+12399. That doesn’t want to display on my work PC (Win 10).

      1. Yah. It’s weird though. 1-9, 10, 30, 40 and 50 are good. It’s base 60 so those plus 20 are all you need. At least it is assuming 2 characters per place value. But it’s only 20 that apparently isn’t in the default font. But there are a lot of variants of each number, I was just picking the ones that looked most like the cuneiform tutorials I found online. But I only found the one non-working 20. I thought maybe there is another 20 that is more commonly included in fonts but wasn’t on the lists I found.

    1. You might want to install one of the Noto fonts (search for “noto fonts” on wikipedia).
      Be aware that any font or fontset that covers a major portion of the whole unicode space is going to be HUGE, and can contribute noticeable slowdown photoshop and cad software simply by being installed (they often try to scan all characters of all fonts to build up an idea of character geometry, etc, and that’s a LOT of glyphs). Better software processes and caches that metadata once, but since fonts weren’t traditionally that large, a lot of software just brute-forces it every time it needs the info. A machine with fonts for comprehensive unicode can result in a significant delay every time the application does anything related to fonts.

      As of April 2021, the Google-sponsored noto font kit covers 95% of all non-CJK glyphs, and 32% of all CJK glyphs, for a combined coverage of 54% of the glyphs defined in the Unicode 13 standard.

      Since nobody normally needs ALL of the unicode glyphspace, there are tools to let you or rebuild a noto font to cover just the glyphs you think you will need. This can often dramatically reduce the size and slowdowns, while giving you every glyph you would ever actually use.

    1. Hey, a few people have pointed out similarities to other symbols. This Yi character I picked at random is from a writing system used by 2 million people to communicate, but we’re very good at pattern recognition. To keep comments focused on the build, I re-shot the main image on the project page, which was updated shortly after your comment

  2. Huh, really nice. Then again, I’ve got a salvaged panel with a set of thumbwheel switches — two blocks of eight wheels each, in hexadecimal —and I’ve been wondering what exactly I should use them in for several years. Selecting Unicode characters or sequences might be a nice application. One block would be enough to select a character using UTF-32, though it might be more fun to use both blocks and treat them as eight bytes of a UTF-8 stream, and just make room on the display for up to eight characters.

  3. b̴̡̧̻̝̺̗̤͓̺͍̞̝̼̟̊̓̉̏͋̿͆̄̈́̚͜ͅǔ̷̡̱̖̻̠̯̎͒͛̀̐t̷̨̢̩̟̊͛͛̓̿͊̽̈̏͋̏͘͝͝͝ ̶̗̟͎̼̹͎̝̱̺͍͚̠̜̓̈́͋͐̀̄̎̀ç̷̮̺̮͚̪̺̙͈͎̟͈̻̖̔̃͆̔́̌̒̑̈́̓̋̐̀̚͘͜͜͝ȧ̷̡̨̙̞̖̹̖̥̯̱̹̥̖̞̃̕̕͘͜n̵̡̢̢̻̰͙̗̋̎̐́̓͋̀̈́̓͛̿̿͋͜͠ ̶̧͈̺̱͎͛̈́͝į̵̢̛̺̖̼̭̦̇̐̀͐̈́̒̄ͅt̸̹̲͈̿͊̂̄̆͊̈́̌̀̚͝ ̴̡̲̱͔̩͖͚̟̓̃́̾́̇̈̓̽̆͛̍͜r̵̨̢̛͕̠̯͚͖̺̫̭̦̬̙̰͒̀̈͆̓͗̑̆͒̾̒́̅̐͘e̷̜͊̽͆͐̑́ņ̵̰̝̮͓̖͍̊̒̅d̸̢̛̝̣͓̹̐̍̇ę̶̱̳͇͙̯͕͆̇̊́̃̓̌̐̊̊̌͛͜ŗ̴̨͈̫̦͓͚̞̜̩̤̙̣̪͙̀͂̂̌̉̈̍̋̊ͅ ̵̼̹̠̬̮̿̑͛͗̄̂͆̑̆̐̆̕͝͝͠͝Z̴̡̞̪̦̦̤̬̗̮͍͆̊̍͒̏̊̒̆͐̽̚̕͜͠ͅä̷̧̠͚́́̉͗͝ļ̴́͋̏̈́̃̔̃̋̾̈́̔͆͑͌̍̊g̵̛̛̲͐̈́̔̉̉́̑͜͝͠o̴̝̱͇̙̜̒͑͒͋̀̀̓̀̌̌͋̊͝͠͠ ̶̡̺͔̩̣̄̐̈́̀͛̌͘̕͠͝T̸̗͚̍̿e̶̡̫̦̯̲͇̙͑̈́̓͑̄̑͋̓͐͠x̷̨̛̺̰̬̰̱̱̦̌̓̌̈́̑̄̉͛̈́̕͝ţ̵̧̛͙̣͖͖̖̬͖̲̗̻̲̩̽̉͗̔͐͝͠

  4. You can create a unicode input with little more than a LED array. The first step is to turn it into an input device that can detect your finger tip so that you can draw the character you want. The next step is to use a K210 chip to run a neural network for recognising the character you want.

    1. I like your thinking. Going one step further, I nerd-sniped myself wondering how big a keyboard with standard spacing would need to be to cover every codepoint and still be physically possible to use. If you arranged it as a circlular panel someone sits in the middle of, and assume they can reach 1m across and 1.5m vertically, the whole console would be ~12 metres in diameter with 80 rows of keys at a 40° slant. Seems practical

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.