Nic Barker Explains ASCII, Unicode, And UTF-8

January 22, 2026

Over on YouTube [Nic Barker] gives us: UTF-8, Explained Simply.

If you’re gonna be a hacker eventually you’re gonna have to write software to process and generate text data. And when you deal with text data, in this day and age, there are really only two main things you need to know: 7-bit ASCII and UTF-8. In this video [Nic] explains 7-bit ASCII and Unicode, and then explains UTF-8 and how it relates to Unicode and ASCII. [Nic] goes into detail about some of the clever features of Unicode and UTF-8 such as self-synchronization, single-byte ASCII, multi-byte codepoints, leading bytes, continuation bytes, and grapheme clusters.

[Nic] mentions about UTF-16, but UTF-16 turned out to be a really bad idea. UTF-16 combines all of the disadvantages of UTF-8 with all of the disadvantages of UTF-32. In UTF-16 there are things known as “surrogate pairs”, which means a single Unicode codepoint might require two UTF-16 “characters” to describe it. Also the Byte Order Marks (BOM) introduced with UTF-16 proved to be problematic. Particularly if you cat files together you can end up with stray BOM indicators randomly embedded in your new file. They say that null was a billion dollar mistake, well, UTF-16 was the other billion dollar mistake.

tl;dr: don’t use UTF-16, but do use 7-bit ASCII and UTF-8.

Oh, and as we’re here, and talking about Unicode, did you know that you can support The Unicode Consortium with Unicode Adopt-a-Character? You send money to sponsor a character and they put your name up in lights! Win, win! (We noticed while doing the research for this post that Jeroen Frijters of IKVM fame has sponsored #, a nod to C#.)

If you’re interested in learning more about Unicode check out Understanding And Using Unicode and Building Up Unicode Characters One Bit At A Time.

15 thoughts on “Nic Barker Explains ASCII, Unicode, And UTF-8”

ialonepossessthetruth says:

January 22, 2026 at 7:11 pm

Thankyouthankyouthankyou. I’ve always seatofthepants’ed it with mostly Asian languages and this is precisely something that should fit well in my head.

Report comment

Reply
1. John Elliot V says:
  
  January 24, 2026 at 6:07 am
  
  pleased to hear it! thanks!
  
  Report comment
  
  Reply
Cuvtixo Daniels says:

January 22, 2026 at 10:44 pm

On a related note, I’m reading “The Chinese Computer: A Global History of the Information Age”(2024) by Thomas S. Mullaney, who also wrote “The Chinese Typewriter: A History”. In the early days of PCs with 7-bit ASCII, getting Chinese characters was a huge problem to overcome, much of the work was also done in Japan, of course, but the PC industry in China was very much shaped by the challenge. I should probably watch the video before commenting here, but the subject is fascinating and Mullaney is the only English writer I knowwho has fully explored the history.

Report comment

Reply
1. ialonepossessthetruth says:
  
  January 23, 2026 at 10:06 am
  
  Rendering Chinese has come a long way since Telegraph Code. Thanks for the book tips.
  
  Report comment
  
  Reply
tyukrtyukjryu says:

January 23, 2026 at 9:33 am

utf is ugly and not comfortable, please make other format

Report comment

Reply
1. John Elliot V says:
  
  January 23, 2026 at 12:40 pm
  
  I agree that UTF-8 is ugly. Particularly the vestigial ASCII control characters remain latent in it, which is kind of gross. But it is eminently practical and it has a lot of clever features. And at this point we’re stuck with it.
  
  Report comment
  
  Reply
Paul says:

January 23, 2026 at 10:48 am

I skimmed the title too fast, seeing “Barker Code”, hoping to learn more. Alas.

Barker code is a fascinating coding scheme relevant to communications, radar, sonar & ultrasound imaging, lidar, and others. https://en.wikipedia.org/wiki/Barker_code

Report comment

Reply
1. John Elliot V says:
  
  January 24, 2026 at 5:59 am
  
  thanks for this link!
  
  Report comment
  
  Reply
rclark says:

January 23, 2026 at 1:59 pm

And all we need is simple ASCII. And then someone had to complicate it…

Report comment

Reply
1. John Elliot V says:
  
  January 24, 2026 at 6:05 am
  
  Well we had that top bit spare and UTF-8 has put it to tremendous use!
  
  Report comment
  
  Reply
2. ziew says:
  
  January 25, 2026 at 1:41 pm
  
  It depends on the definition of “us”. Most of the world population might beg to differ.
  
  Report comment
  
  Reply
Scanboostar says:

January 23, 2026 at 3:41 pm

Does anyone knows a good approach/ utility to “convert” UTF-8 to ASCII, preferably in XSLT 2.0? I tried iconv with the //TRANSLIT option before transformation but the business is not really happy with the result.

Report comment

Reply
1. John Elliot V says:
  
  January 24, 2026 at 6:04 am
  
  Well in UTF-8 everything which can be ASCII already is ASCII! As for converting the rest that’s not going to be practical. How do you convert 日本語 to ASCII?
  
  Report comment
  
  Reply
  1. rclark says:
    
    January 24, 2026 at 8:11 am
    
    “how do you convert…” : You don’t . Just use ASCII 7bit…. Keep it simple. Use the simple codes as the universal language for communication across the globe. I know, I know, the barn door is already open, and that horse is gone and stuck with the ‘new’ way…. Would have been nice though and unifying….
    
    Report comment
    
    Reply
ziew says:

January 25, 2026 at 1:57 pm

I don’t get the part about SMTP and 16-bit encodings. It sounds like ASCII-only e-mails were the reason why 16-bit encodings didn’t get traction and we ended up with UTF-8. Millions of people were using their non-ASCII writing systems with 7-bit transport just fine. Some could use quoted-printable if the non-ASCII characters were a minority. Some could use encodings made specifically for 7-bit transport (e.g. ISO-2022-JP). And the content could always be base64-encoded.

Report comment

Reply

Hackaday

Nic Barker Explains ASCII, Unicode, And UTF-8

15 thoughts on “Nic Barker Explains ASCII, Unicode, And UTF-8”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

German Fireball’s 15 Minutes Of Fame

The “Tin Blimp” Was A Neither Tin Nor A Blimp: The Detroit ZMC-2 Story

Secure Communication, Buried In A News App

New Artemis Plan Returns To Apollo Playbook

Back To Basics: Hacking On Key Matrixes

Our Columns

Blood Tests Could Provide Early Warning Of Alzheimers Disease

Ask Hackaday: What Will An LLM Be Good For In The Plateau Of Productivity?

Hackaday Links: March 8, 2026

Choice, Control, And Interruption

Hackaday Podcast Episode 360: Cool Rubber Bands, Science-y Stuff, And The Whys Of Office Supplies

15 thoughts on “Nic Barker Explains ASCII, Unicode, And UTF-8”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns