Why Names Break Systems

Web systems are designed to be simple and reliable. Designing for the everyday person is the goal, but if you don’t consider the odd man out, they may encounter some problems. This is the everyday life for some people with names that often have unconsidered features, such as apostrophes or spaces. This is the life of [Luke O’Sullivan], who even had to fly under a different name than his legal one.

[O’Sullivan] is far from a rare surname, but presents an interesting challenge for many computer systems. Systems from the era of penny pinching every bit relied on ASCII. ASCII only included 128 characters, which included a very small set of special characters. Some systems didn’t even include some of these characters to reduce loading times. Throw on the security features put in place to prevent injection attacks, and you have a very unfriendly field for many uncommon names.

Unicode is a newer standard with over 150,000 characters, allowing for nearly any character. However, many older systems are far from easy or cheap to convert to the new standard. This leaves many people to have to adapt to the software rather than the software adapting to the user. While this is simply poor design in general, [O’Sullivan] makes sure to point out how demeaning this can be for many people. Imagine being told that your name isn’t important enough to be included, or told that it’s “invalid”.

One excuse that gets thrown about is the aforementioned injection prompts that can be used to affect these systems. This can cause systems to crash or even change settings; however, it’s not just these older systems that get affected. For modern-day injection prompts, check out how AI models can get affected!

Thanks to Ken Fallon for the tip!

10 thoughts on “Why Names Break Systems

  1. If you reject names because you are afraid of injection prompts your software sucks because you are concatenating strings to form SQL-statements that are later parsed. And if you are forced to do it that way, there are a multitude of functions/libraries available that sanitizes and quote strings properly for SQL which means if you are not using them your software sucks even more.

    This isn’t a hard problem and it has been solved for decades and anyone who writes code that don’t sanitize inputs should go back to school until they can be trusted to work in a professional setting. What’s worse, just imagine what other problems a software will have if it can’t even process someone’s name properly.

  2. I once had a coworker with “30 years of SQL DB experience” once ask me ‘Say, just how do you handle apostrophes in names anyway?”

    I must admit I kinda stared at him in shock “um like dude that’s like page 3 in “SQL for Dummies” and I couldn’t remember off the top of my head anyway as I shoved all that into a function years ago. (It’s a bit tricky as it varies between systems and you need to watch out for things like # as well.)

    As the old joke goes, often “30 years of experience” is just one year repeated 30 times.

  3. Similar to living in Japan with a middle name…

    According to the banks, my first and middle names are one word and along with my last name are in all uppercase, but now it doesn’t match my Japanese government ID or drivers license. So I have to get a human at the bank involved whenever any verification is needed, which is a pain as the system is increasingly set up for online verification methods, BUT they also apparently continue to refuse to allow middle names, or even upper and lower case characters…

    And don’t get me started on the arbitary and mandatory random use of half-width and full-width characters, or even character limits on names so a good portion of western names are simply too long to be entered into the system…

  4. I think this Hackaday is on to something here.
    Youtube is giving me a: “Sign in to confirm you’re not a bot” for this video. LOL.

    And that has not really been possible anymore from the time that youtube rejected my user name because it was “unpronounceable”. And that was some 10 to 15 years ago.

    Is the TSA using the rubber gloves on this guy?

  5. It’s annoying, but also understandable same time. Computers have an English heritage, simply.
    As a workaround, there are things like umlauts or HTML entities.
    Using international ID numbers for a world of global citizens would be another one.
    Because as pointed out, there are various combinations of names in the world.
    Saving a database name and storing a bitmap of the real name for comparison purposes would be another one.
    Especially for those humans living in some rain forest tribes who use their feet for signing documents or something. ;)

  6. Remember the old PaypaI scam? Capital I instead of lower case L at the end tricked many people into logging into fake web site and lost money. Depending on the default font used, it can be indistinguishable to some people.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.