UTF

Computer engineer [Marco Cilloni] realized a lot of developers today still have trouble dealing with Unicode in their programs, especially in the C/C++ world. He wrote an excellent guide that summarizes many of the issues surrounding Unicode and its encoding called “Unicode is harder than you think“. He first presents a brief history of Unicode and how it came about, so you can understand the reasons for the frustrating edge cases you’re bound to encounter.

There have been a variety of Unicode encoding methods over the years, but modern programs dealing with strings will probably be using UTF-8 encoding — and you should too. This multibyte encoding scheme has the convenient property of not changing the original character values when dealing with 7-bit ASCII text. We were surprised to read that there is actually an EBCDIC version of UTF still officially on the books today:

UTF-EBCDIC, a variable-width encoding that uses 1-byte characters designed for IBM’s EBCDIC systems (note: I think it’s safe to argue that using EBCDIC in 2023 edges very close to being a felony)

Continue reading “Understanding And Using Unicode” →

Hackaday

1 Articles

Understanding And Using Unicode

Search

Never miss a hack

If you missed it

Ask Hackaday: How Much Compute Is Enough?

WheatForce: Learning From CPU Architecture Mistakes

Improving FDM Filament Drying With A Spot Of Vacuum

Spy Tech: Conflicts Bring A New Number Station

The Most Secure, Modern Computer Might Be A Mac

Our Columns

Hackaday Podcast Episode 364: Clocks, Cameras, And Free Will

This Week In Security: The Supply Chain Has Problems

Sega Meganet: Online Gaming In 1990

Ask Hackaday: Using CoPilot? Are You Entertained?

Solar Balconies Take Europe By Storm

Search

Never miss a hack

Subscribe

If you missed it

Our Columns