L10n

Computer engineer [Marco Cilloni] realized a lot of developers today still have trouble dealing with Unicode in their programs, especially in the C/C++ world. He wrote an excellent guide that summarizes many of the issues surrounding Unicode and its encoding called “Unicode is harder than you think“. He first presents a brief history of Unicode and how it came about, so you can understand the reasons for the frustrating edge cases you’re bound to encounter.

There have been a variety of Unicode encoding methods over the years, but modern programs dealing with strings will probably be using UTF-8 encoding — and you should too. This multibyte encoding scheme has the convenient property of not changing the original character values when dealing with 7-bit ASCII text. We were surprised to read that there is actually an EBCDIC version of UTF still officially on the books today:

UTF-EBCDIC, a variable-width encoding that uses 1-byte characters designed for IBM’s EBCDIC systems (note: I think it’s safe to argue that using EBCDIC in 2023 edges very close to being a felony)

Continue reading “Understanding And Using Unicode” →

Hackaday

1 Articles

Understanding And Using Unicode

Search

Never miss a hack

If you missed it

My Space

NASA Is Now Tasked With Developing A Lunar Time Standard, Relativity Or Not

VAR Is Ruining Football, And Tech Is Ruining Sport

Mining And Refining: Uranium And Plutonium

Programming Ada: First Steps On The Desktop

Our Columns

Retrogadgets: The Ageia PhysX Card

Hackaday Links: May 5, 2024

Tool-Building Mammals

Hackaday Podcast Episode 269: 3D Printed Flexure Whegs, El Cheapo Bullet Time, And A DIY Cell Phone Sniffer

This Week In Security: Default Passwords, Lock Slapping, And Mastodown

Search

Never miss a hack

Subscribe

If you missed it

Our Columns