Error Codes And The Law Of Least Astonishment

December 17, 2021

Do you know the law of least astonishment? I am not sure of its origin, but I first learned it from the excellent “Tao of Programming.” Simply put, it is the principle that software should always respond to the users in a way that least astonishes them. In other words, printing a document shouldn’t erase it from your file system.

Following the law of least astonishment, what should a program do when it hits a hard error? You might say that it should let the user know. Unfortunately, many systems just brush it under the rug these days.

I think it started with Windows. Or maybe the Mac. The thinking goes that end users are too stupid or too afraid of error codes or detailed messages so we are just leaving them out. Case in point: My wife’s iPhone wouldn’t upload pictures. I’m no expert since I carry an Android device, but I agreed to look at it. No matter what I tried, I got the same useless message: “Can’t upload photos right now. Please try again later.” Not only is this not very informative, but it also implies the problem is in something that might fix itself later like the network.

The real culprit? The iCloud terms of service had changed and she had not accepted the new contract. I have a feeling it might have popped up asking her to do that at some point, but for whatever reason she missed it. Until you dug into the settings and checked the box to agree to those terms, “later” was never going to happen.

But it isn’t just iPhones. Windows is full of things like that and you only hope there will be a log in the event viewer with more details. I also see more of it now on Linux, although there is usually a log file somewhere if you know how to find it. While I get it that programs having errors run the risk of astonishing the user, it is even more astonishing if there’s no explanation of what’s wrong. Imagine if your bank sent you a note: there is a problem with your account. So you respond: “Did I overdraw?” They reply, “No.” Now what? That’s the state of many software errors today.

There’s really no excuse on desktop systems or websites. However, you might want to forgive tiny embedded systems. Don’t! I recently ported the 3D printer firmware Marlin to an ANET A8 board — an 8-bit processor with little memory — that had been on Repetier firmware for many years. The first time I tried to do an autolevel probe I got the message: Probing failed. That’s it.

I’ll grant you, that you can turn on autolevel debugging to get more information, but I’m already at 98% flash utilization, so that would require temporarily removing a bunch of features and rebuilding the code. But why not do like we would do in the old days:

unit global_error=0;
void do_something(void) {
   global_error=1;
   if (process1()==FAIL)  return;
   global_error++;
   if (process2()==FAIL) return;
. . .

   global_error=0;
   return;
}

This doesn’t take much space. Now you can report something like Probing failed (8) and I can at least go to the code and determine what the 8th step was that failed. I’m sure someone would even post a list of codes and what they meant in a case like that.

Too much overhead? Tell me the program counter where the error happened. That used to be a pretty common practice. Granted, it requires you to have a memory map file and know how to read it but it is still better than nothing.

We spend a lot of time thinking about how projects and software should work. But we need to spend time thinking, too, about what happens when they don’t work. It is fine that we can do in-circuit debugging or hook up a logic analyzer, but that won’t help our users. Even if it is just for you, why not make it a little easier on yourself?

As we have said before, “There’s no such thing as too much information.” In addition to guarding against system errors, you can also help users not to astonish themselves.

Image Credit: [Elisa Ventur] via Unsplash.com

73 thoughts on “Error Codes And The Law Of Least Astonishment”

KD9KCK says:

December 17, 2021 at 10:06 am

Who hasn’t at some point done the print(“1”) some steps print(“2”) some more steps print(“3”) style of debugging to figure out what was really causing the strange error.

Report comment

Reply
1. Rog Fanther says:
  
  December 17, 2021 at 10:29 am
  
  print (“I am here”);
  …
  print(“Now Iam here”);
  …
  print(“I should not be here”);
  
  Report comment
  
  Reply
2. Dan says:
  
  December 17, 2021 at 10:35 am
  
  People who have a C# style integrated dev environment / IDE / debugger that pauses and shows you the code line where the exception occurs. For years and years I didn’t debug that way.
  
  Though I have done it very occasionally for a JavaScript issue where – despite the browser dev tools – callbacks can make debugging big loops hard.
  
  But for php etc, yes, daily.
  
  Don’t forget that precise debug codes can leak to malicious actors information which helps them form an exploit. If you can’t distinguish an overdraft from a buffer overrun, then fuzzing is a lot harder.
  
  Report comment
  
  Reply
  1. Jan Praegert says:
    
    December 17, 2021 at 10:47 am
    
    catch (Exception e) { Console.WriteLine(e.ToString()); Environment.Exit(42); }
    
    Report comment
    
    Reply
  2. Anton says:
    
    December 17, 2021 at 10:52 am
    
    That’s security though obscurity logic. If we’re working on OSS especially, attackers could put that debugging code in themselves. Might as well do it first, so the good guys can catch the bugs faster.
    
    Report comment
    
    Reply
    1. Dan says:
      
      December 17, 2021 at 2:02 pm
      
      Yes, definitely true. But not for banks, gmail, and many systems like that we interact with. There’s lots of systems where the code is only ever run by trusted actors.
      
      Report comment
      
      Reply
      1. Sec 101 says:
        
        December 17, 2021 at 4:55 pm
        
        IF you think any javascript executed client side can be trusted, or is secure, you need to abandon web development immediately.
        
        Report comment
  3. KD9KCK says:
    
    December 17, 2021 at 11:00 am
    
    When your working with python or embed systems (Arduino) thats not as easy as it seems. And yes while python may show you where it occurred you might have no idea how you ended up there. I have had times where I used print(“1”) … print(“2a”) …. print(“2b”) to figure out why it was taking paths it shouldn’t have been able to.
    
    Report comment
    
    Reply
    1. smellsofbikes says:
      
      December 17, 2021 at 11:11 am
      
      function_name(params)
      {
      if(debug_level == 2)
      printf(“starting function_name!\n);
      }
      
      has saved me so much debug time on trying to find squinchy nested problems that now I write it in as a reflexive move.
      
      Report comment
      
      Reply
      1. ian 42 says:
        
        December 17, 2021 at 5:50 pm
        
        do it with a #define etc (on embedded systems), then it uses no memory when disabled..
        
        Report comment
    2. Dan says:
      
      December 17, 2021 at 2:04 pm
      
      No stack trace in python? I regularly print them out in php (well, error_log them to be precise). It’s very useful for “how did we get here?” But error_log(“a”) etc is great for working out when we left a function.
      
      Report comment
      
      Reply
      1. Average Python Enjoyer says:
        
        December 17, 2021 at 2:15 pm
        
        Python has, afaik, full stack trace support that appears by default in all debugging environments I have used. KD9KCK seems to not use the debug mode of python, instead just running it and seeing where it errors.
        
        Report comment
      2. Jack Danson says:
        
        December 20, 2021 at 5:00 pm
        
        Python definitely has a full stack trace. The problem, is that many people wrap a large part of their code in a try/except sections, which then obscure the actual culprit that caused the error. However, there is a quick solution to to this for debugging code a problematic piece of code, without removing any wrapped try/excepts. Simply add this code to be the first line in the “except” section: import logging; log.exception(“Error info:”)
        
        Suddenly, you’ll get your stack trace again.
        
        Report comment
    3. Jack Sprat says:
      
      December 18, 2021 at 1:11 am
      
      you + are = you’re
      
      Report comment
      
      Reply
3. Alvaro Marenco says:
  
  December 17, 2021 at 10:42 am
  
  print(” ==== 1st loop ==== “)
  …
  print(” **** 2nd loop **** “)
  …
  print(” $$$$ nth loop $$$$ “)
  
  Report comment
  
  Reply
4. X says:
  
  December 17, 2021 at 10:57 am
  
  And who hasn’t encountered the situation where the debug messages are printed out of order due to buffering and thread storage? Or how about when the messages go into the bit bucket because someone changed stdout? And then there are all those interrupt routines and other async calls where you are not allowed to use studio? And then there is all the memory allocation that goes on inside printf, totally messes with your program’s heap usage. Yes indeed making random calls to printf is a great way to make a bad program worse.
  
  Report comment
  
  Reply
  1. Bryantherobotman says:
    
    December 17, 2021 at 5:34 pm
    
    > making random calls to printf
    I have had a few times where that random call makes the program better, and it is almost always a variable I forgot to declare as volatile.
    
    You’re right though, printf can make a bad bug worse.
    
    Report comment
    
    Reply
  2. RubyPanther says:
    
    December 24, 2021 at 12:42 pm
    
    Those all sound like situations where the print statement would have helped me find either bugs, or architecture problems! Fixing the problems you listed would make a bad program better.
    
    You do fix those things when you encounter them, don’t you?
    
    Report comment
    
    Reply
5. Andy Pugh says:
  
  December 17, 2021 at 2:53 pm
  
  All the time. If you are working in realtime kernel modules that can’t halt that’s all you have (if your test PC is running on that same kernel)
  
  Which is, perversely, why if someone asks me how to learn to programme (especialy in OO) I tend to point to Excel VBA. Where you are unavoidably in the middle of a _huge_ object model and have no choice but to interact with it, and you also have really rather a friendly IDE and can break at any point and view all variables/pbjects and their values while single-stepping. It really is very instructional and friendly.
  With just enough completely broken stupidity to make you feel like a coder :-)
  Like, in current versions, the Watch window blanks out until you click things.
  
  Report comment
  
  Reply
6. C says:
  
  December 20, 2021 at 2:21 am
  
  Or turning on LEDs at certain parts of the code since there often isn’t a way to print.
  
  Report comment
  
  Reply
zeiche says:

December 17, 2021 at 10:08 am

oops, something went wrong.

Report comment

Reply
Glenn Taylor says:

December 17, 2021 at 11:10 am

I have used my logic analyzer and set unused pins high/low to trace arduino style microcontroller activity. Doesn’t take a long time like print does, and doesn’t require a console. However it does require an analyzer. Leds can be used for similar “got here” output. In that code, set the pin high.

Report comment

Reply
1. John Bump says:
  
  December 17, 2021 at 11:13 am
  
  Yeah, I was thinking about the similar stuff I do in hardware. Every serial line has a header so I can see what exactly that SPI transaction is doing, all the logical blocks of the board are isolated from each other with 0 ohm resistors so I can work on just one section if needed, interesting signals have a slightly oversize via that just fits a scope probe tip, or maybe a test point with a loop.
  
  Report comment
  
  Reply
  1. Nick says:
    
    December 27, 2021 at 8:16 pm
    
    Yep, every power and signal rail should have a test point. If there’s room, I like to put a footprint for an LED and resistor on there too. I can always leave them unpopulated if I don’t need them. 0R resistors are good in digital signal lines too – you can increase the resistance or add ferrites etc if you run into problems during development
    
    Report comment
    
    Reply
2. Rud Merriam says:
  
  December 17, 2021 at 4:26 pm
  
  In the late 80s I developed a system based on the STD bus. The STD board supplier had a 24 output board. I asked them to put 8 red and 16 green LEDs on the output so they could be seen. You could see everything going on with the system by watching the LEDs.
  
  They made it an actual product.
  
  Report comment
  
  Reply
3. a Jaded Hobo says:
  
  December 18, 2021 at 1:13 pm
  
  I used an output pin and a benchtop counter before I could afford anything else. It worked a charm when bringing up new boards before JTAG was a thing.
  
  Report comment
  
  Reply
Greg A says:

December 17, 2021 at 12:26 pm

at my work place, POLA (Principle Of Least Astonishment) is a very useful tool. a lot of behavior isn’t defined by any standard, so we have to wing it…and POLA really cuts down on surprised / irritated emails from customers.

as for error messages…at work we of course have a formal approach to it that is somewhat robust. but one thing i’ve enjoyed in my time off is how it works in android. you can just throw an exception (or a component you rely on can throw an exception), and if it goes uncaught…i don’t even know how it reports to the user. i think maybe your app simply disappears. but — i think as part of the play store — most phones will report a stack trace back to google. and as the guy who uploaded the app, i can go to the ‘google play developer’s console’ and i can see all the stack traces people have experienced. sometimes it’s still a mystery, but at least 2/3rds of the time, right away it’s obvious what went wrong.

given that there’s so much diversity in the android environment, it’s guaranteed that if my app is used enough, it’s gonna wind up on a phone where the vendor made an astonishing API-influencing decision that i couldn’t realistically anticipate. and when it does, stack traces on uncaught exceptions are really handy!

Report comment

Reply
macsimski says:

December 17, 2021 at 1:53 pm

A completely expected error has occured.

Report comment

Reply
Comedicles says:

December 17, 2021 at 2:02 pm

This sounds like a variation of Jef Raskin’s human interface rules at Apple and Information Appliance. His three fundamental rules are Monotony, Modelessness, and the third one escapes me at the moment. Reliable? Predictable? Does no damage nor looses work? I’m getting redundant. (Apple has broken “looses work” in a basic part of the interface in the most recent OS releases and Modelessness and Monotony are broken many times in iOS.)

I think this Principal of Least Astonishment falls under Monotony – the idea that actions and responses should be familiar and unsurprising. But it certainly may have originated elsewhere.

Report comment

Reply
1. justsayin says:
  
  December 17, 2021 at 6:10 pm
  
  First heard this called “the principle of least surprise”, somewhen around 87 or 88. Mac-84 pretty much had it nailed. M$ was nothing but an application launcher until W-95, when they finally appeared to get it. XP still had it, though 7 started to lose it, but 10 has just thrown it away. Gawd I loved it when the OS was boring, and would just stay out of the way and let me work.
  
  Report comment
  
  Reply
Dan says:

December 17, 2021 at 2:10 pm

In some systems (eg webservers) you can log back to a log which devs can read, and return something boring to users.
Other times you can hide a more precise error message by crude steganography – change the wordings slightly depending on the precise error. Nothing “scary” sent to users, but if they read you out the message you know which one it is.

Of course, if a Java app logs an error, you can load an object from a remote server to do the debugging… :P

And don’t get me started on Bungie’s unhelpful error codes in Destiny…

Report comment

Reply
Bunsen says:

December 17, 2021 at 2:57 pm

Simplistic error code printouts are all well and good, until you realize that you’ve evolved to mostly reimplementing errno.h and your project is now obligated by tradition to tell some user “Not a typewriter” under entirely inappropriate circumstances.

Report comment

Reply
1. One Way Time Traveler says:
  
  December 23, 2021 at 3:03 pm
  
  Well, unless there is actually a typewriter involved, your project is at least not lying :)
  
  It is interesting how often the well-thought-out parts of a system or standard fall by the wayside in a few years, while the last minute quick hacks and humorous messages written in a haze of sleep deprivation often seem immortal.
  
  But, while “Not a typewriter” -> “Not a TTY” and “Printer is on fire” -> “Device unresponsive”, as phrases, may not be particularly prone to generating insights in a modern reader-of-errors, the underlying conditions are still very relevant and should be handled.
  
  Tying back to the original article’s topic: It’s good to remember that not all experiences are universal, and that some assumedly universal experiences age badly. “The least astonishment” is more complicated when dealing with multiple cultires… and *anyone* from the future is from another culture :)
  
  Report comment
  
  Reply
Chris says:

December 17, 2021 at 2:59 pm

Shenanigans, or lazyness, is only a thing when the main purchasing powers of the time, whichever generations that encompasses, allows it to happen with the power they wield. What manufacturers might be driven by here is the data science, if that tells them they can get away with subpar programming/software without any monetary pushback, then that is what we get.

Report comment

Reply
1. Stappers says:
  
  December 18, 2021 at 7:21 am
  
  Sad but true
  
  Report comment
  
  Reply
Andy Pugh says:

December 17, 2021 at 3:16 pm

When I write an error message I write it for me, as I am pretty sure that I will be the one debugging it.
I don’t see how any programmer would code-in an error message that would not help _them_ debug it.

It’s going to come back to you, no matter who you work for.

This applies to stuff I write only for me, stuff I write as utilities for work colleagues, and stuff that I write for an open-source project with tens of thousands of users.

There seems to be a bit of a split. Apple, Microsoft and Autodesk will definitely call home with an error log / memory dump for certain classes of error. I see that approximately weekly. But also all three sometimes just fail to work with an unhelpful message, or none.

So, frustratingly, it’s almost random whether an error is glossed-over or properly reported.

Report comment

Reply
1. RubyPanther says:
  
  December 24, 2021 at 12:48 pm
  
  “I don’t see how any programmer would code-in an error message that would not help _them_ debug it. ”
  
  For many developers that’s more common than an error message that does help debug it; when they see the error message, they fire up a debugger.
  
  Personally, my goal with an error message is to never need a “debugger,” to let me go straight to the code and understand what happened from there. But a lot of people only debug in a debugger, they just go to see what happened, they don’t pause to think about why until they have the what. For me, I don’t really care that much what happened, I care about why what I expected to happen, didn’t.
  
  Report comment
  
  Reply
joelfinkle says:

December 17, 2021 at 4:40 pm

Then there’s systems like VBA, which is rife with what I call Heisenbugs: the act of using the debugger to observe the bug can change the bug. It may just be because the screen updates when there’s a breakpoint, but there’s been at least 8 times in my life where debugging can’t find the problem.

Report comment

Reply
1. Robert of Texas says:
  
  December 17, 2021 at 5:58 pm
  
  The most difficult bug I ever encountered was a bug in a computer’s microcode. No amount of the usual searching for it would ever reveal it, it just seemed to happen randomly. Once understood, the bug only occurred when multiple threads tried to update the same piece of memory – the memory interlock instruction could be missed in the microcode on a specific interlock-with-add instruction if a conditional path was followed.
  
  Report comment
  
  Reply
  1. justsayin says:
    
    December 17, 2021 at 6:13 pm
    
    Branch prediction is a bear, ain’t it?
    
    Report comment
    
    Reply
  2. 8bitwiz says:
    
    December 20, 2021 at 7:34 am
    
    I had a bug once that was tough to find. The system was written in 6809 assembly language, and our unattended systems were locking up in the field, even with a hardware watchdog circuit. (This is when I learned that Roswell NM was a two-hour drive from the nearest commercial airport, but at least I didn’t have to go there!)
    
    A bug caused it to go off in the weeds and hit one of the two undocumented TEST opcodes (14H/15H) that put the CPU into an infinite loop. This would read every address sequentially, once per clock cycle, and could only be stopped by the reset signal. I was using an ICE that shared the same CPU for both host and target, so it would kill my ICE too. I had to figure out where to put a breakpoint just to begin single-stepping.
    
    It turned out that a PAL for the watchdog decode logic did NOT check the WR signal. The test mode would read the watchdog address often enough to keep it happy. Even better was that by this time, the latest version of the hardware removed the WR input to this PAL because nothing was using it. I never found out what they did about that. They were just starting to try RS-485 cards in a PC and migrating tasks over, but it still needed the old 6809 system for things they hadn’t migrated yet.
    
    Report comment
    
    Reply
kisstek says:

December 17, 2021 at 5:18 pm

Then there’s ‘ed’:

?

Report comment

Reply
ian 42 says:

December 17, 2021 at 5:52 pm

there is two differnt things being discussed above – finding a bug/inspecting an error when you know there is one, and what the user sees.

This article is more about what they user sees – and I agree with the author – every (and I mean every) error that gets sent to the user should have a unique number that is displayed. Period.

Report comment

Reply
1. Somun says:
  
  December 17, 2021 at 6:58 pm
  
  That is a maintenance nightmare. Not gonna happen. Comma.
  
  Report comment
  
  Reply
  1. ian 42 says:
    
    December 18, 2021 at 2:23 am
    
    no it isn’t. In embedded firmware you have one .h with #defines and you add a new one for every error when you write the code to handle the error.
    
    Report comment
    
    Reply
Robert of Texas says:

December 17, 2021 at 5:52 pm

This was a while back…around 1990, but I was floored to discover that most programmers in a supposedly advanced programming shop were not even checking for errors. When I tried to find out why, the common theme was “we don’t know what to do with an error so we ignore them”. As programming has become more abstract, I have found that programmers have become more lazy. You can separate the good coders from the bad on how well they understand and handle errors or unexpected events.

Report comment

Reply
1. greenbit says:
  
  December 17, 2021 at 6:15 pm
  
  +2 to that. How can you even know what’s happening if you don’t check return codes? Kids these days …
  
  Report comment
  
  Reply
2. N. Christopher Perry says:
  
  December 18, 2021 at 5:58 am
  
  That shit does not fly in medical software development. Every error has to be trapped and debugged, as people’s lives are on the line (not to mention that U.S. FDA reviewers take a very dim view of ‘ignoring’ errors).
  
  Report comment
  
  Reply
3. Tom Buskey says:
  
  December 18, 2021 at 10:55 am
  
  Programmers are optimists.
  
  Report comment
  
  Reply
4. PPJ says:
  
  December 25, 2021 at 10:46 am
  
  its not spectacular to fix a bug in your work whereas prgrogrssing with project is seen as success no matter how bad it is done. If it works – don’t touch.
  
  Report comment
  
  Reply
GLyndon says:

December 17, 2021 at 6:29 pm

I believe that the “Principle of Least Surprise” was originally put forth by Steve Jobs as he helped Apple understand how to do UI things.
Now, I’m the farthest thing there is from an Apple fanboi, but I respect the wisdom that Jobs had in such things, and will always remember and respect those things he brought to life that improved online for everyone.

Report comment

Reply
1. RubyPanther says:
  
  December 24, 2021 at 1:05 pm
  
  Unlikely. IBM published a version in 1984, for example.
  Computing Center Newsletter at the University of Michigan published a version in 1978. In both cases they were restating a well known cliche/truism.
  (see: wikipedia)
  
  Report comment
  
  Reply
targetdrone says:

December 17, 2021 at 10:22 pm

Long ago we established guidelines for displaying user errors. Here were some of the more important ones.

0. Know your audience. The audience for our cash registers were cashiers and store managers, running on locked down appliances; people with no access to the systems or networks. They’re standing in front of a customer, and are probably embarrassed because they look like they don’t know what they’re doing, or think they did something wrong. They are also not paid to be our surrogate tech support people.

1. Preventative measures are better than error messages. (Limit the user’s ability to make a mistake, i.e. if you’re defining a price field, only allow numeric keypresses.)

2. Don’t display an error if the cashier can’t do anything about the problem. (There’s no sense telling the cashier the network is down when they can’t fix it. Instead, define “network down” behavior and silently follow it. Meanwhile, tech support should be discovering the problems and should be fixing them without store help.)

3. An error message must never blame the user. (Any error is frustrating, putting the operator in a bad mood. Do not antagonize them further.)

4. An error message should be helpful and tell the operator exactly what to do next to recover.

5. An error message should be short. Nobody wants to read a sentence, let alone a paragraph.

6. An error message must not include error codes. Instead, logs with sufficiently detailed error data must be provided for remote support personnel.

7. Our business process owner had to review and approve all the messages — not the engineers.

We had other rules I don’t recall off the top of my head, but they were all aligned with “know your users” and “least astonishment.”

Report comment

Reply
1. KeithFromCanada says:
  
  December 18, 2021 at 9:25 pm
  
  Regarding #1, back in the late ’80s, while coding in PDS, I came up with ‘masked input’, where keyboard scancodes were used to pull individual keypresses and, if the resulting character wasn’t part of the input mask (i.e. a string of legal characters), it was just ignored. That eliminated *many* ID10T errors. (I added formatting and control options to the masks for things like the slashes in dates, picking a particular input mask for a particular range within a string, etc.) It was pretty robust and, while it took a *LONG* time to figure out from scratch and code rigorously, it saved a hundred times the time and frustration of just using INPUT and dealing with foolish users.
  
  Report comment
  
  Reply
2. Matt Cramer says:
  
  December 28, 2021 at 7:38 am
  
  Regarding point #2, wouldn’t it make sense to display an error (or at least a status indicator) if the error condition prevented certain features from operating? For example, if the network was required for the credit card reader, there might be a “Credit card reader offline” warning, or attempting to use the card reader would give a note “Card reader offline, cannot process cards until IT support fixes this” sort of message.
  
  Report comment
  
  Reply
  1. JMR says:
    
    January 3, 2022 at 11:10 am
    
    I’m with you. I think these are pretty bad rules, frankly. “Know your audience” can *easily* turn into “dumb down your errors,” which is *usually* the wrong thing to do. Give the user some bread crumbs. Trust that your users might have a creative way to work around a problem if you TELL THEM WHAT IS GOING ON. Obviously don’t confuse them with a bunch of BS, but if the problem is “the network is down,” you waste their time by not TELLING THEM SO. They’re going to keep bonking buttons and wondering why everything seems broken, but if you tell them “I can’t talk on the network right now. Please call 800-999-1234,” you can save them the embarrassment of wasting time in front of a customer.
    
    Basically there’s a trade off at play. You don’t usually harm your user if you provide too much info, but you can confuse them. There’s also the risk of exposing info to hackers that you didn’t intend to. Your engineers however can solve problems more quickly if they’re given a detailed description of what’s going on in the code. So you’re trading some confusion for recovery time, with a dash of security risk. Only a good product owner knows how to balance that equation, but erring on the side of “completely obfuscate the problem,” isn’t usually the right answer.
    
    Report comment
    
    Reply
Stephen says:

December 18, 2021 at 7:59 am

As far as unhelpful error messages go, I am reminded of “Curious Marc” attempting to mend an “intelligent” sewing machine which refused to sew, stating that “The safety device has been activated!”
https://www.youtube.com/watch?v=H1zD5yt4fPo
After dismantling it, which took a good twenty minutes, he discovered that the drive belt had come off. He put it back on and then presumably spent another ten or twenty minute putting the thing back together.

Report comment

Reply
Sandro says:

December 18, 2021 at 9:21 pm

There’s at least one excellent point in this article: expose the distinction between permanent errors and transient errors to the user. That’s good UX.

There is, however, such a thing as too much information, that’s why every logger I know of has logging levels. Signal to noise ratio is important.

Report comment

Reply
8bitwiz says:

December 20, 2021 at 7:04 am

Beware of lazily leaving logging turned on with nothing to see it! I was working on a an actual product that had a serial port which was not used in the field. So I just left it on all the time logging messages, hooked up to a port on my PC while developing code.

Unfortunately the port was set to 9600 baud, and the system used cooperative “multitasking”. So when something caused the logging load to be more than 9600 baud and a few dozen bytes of buffer could handle, the whole (soft real-time) system came to a halt waiting for the logging-to-nowhere to catch up.

And as for printf debugging, that’s a skill which is always going to be necessary when there are times you can’t just halt the CPU and single step. It’s quite an art to know when and how much to print. You also need to consider the effect of serial port or stdout buffering, to know when to put in a delay or wait or flush to ensure that the messages aren’t laungushing in a buffer, or that they even get out of the system before it crashes.

When things get really senstiive (like in an interrupt context), creating a global array and dropping bytes into it can be enough to save the day.

Report comment

Reply
8bitwiz says:

December 20, 2021 at 7:37 am

When “THE RACe condition” happens, it can kill people.

Report comment

Reply
1. 8bitwiz says:
  
  December 20, 2021 at 7:37 am
  
  yay, thanks commenting system, that was supposed to be a reply to the FDA wanting all errors logged
  
  Report comment
  
  Reply
Darren says:

December 20, 2021 at 3:14 pm

Except I once spent about 2 hours on an original Mac trying to figure out why I couldn’t rename a file. The floppy was locked, but instead of giving me a dialog telling me that or something, it just didn’t let me select the file name. I could rename files on the hard drive, but the file on the floppy couldn’t be renamed, because Apple thought it was a better idea to not let you get into an error state by disallowing you from even *starting* the actions you want to do.

Report comment

Reply
LordNothing says:

December 20, 2021 at 5:53 pm

all my error messages are samuel l. jackson quotes.

get these motherf*cking bugs out of my motherf*cking code!

Report comment

Reply
1. greenbit says:
  
  December 22, 2021 at 5:31 am
  
  WHAT?
  HOW?
  ?SN ERROR
  
  Report comment
  
  Reply
astonished says:

December 21, 2021 at 2:16 am

My wife had to install software from a CD for homework assignments for her students. She thought it was smart to set her desktop as an install location. The installer allowed this without a warning and without creating a folder. This created a mess of her desktop by adding many files. So she decided to uninstall. The uninstaller simply deleted every file on the desktop without a warning, so she found out much later. And the files didn’t end up in the recycling bin. With an undelete tool we were able to recover some files. Luckily nothing of value was lost, but it to her several hours to recreate what was lost.

Report comment

Reply
Jim says:

December 21, 2021 at 9:53 pm

Not sure if the emphasis of this article is about user experience vs. debug experrience. As a user, I might see an error something like “Video cannot be played. Error code 44588282.” Try as I might the internets give no clue as to the meaning of that error code, other than other users are wondering about the same thing. For the user, that is useless information. Think the software just makes up random numbers to impress someone.

Likewise the famous, “The system cannot find the file specified”.

Way back working on ESDI device drivers, I worked closely with a Maxtor engineer — he had a beautiful debug/diagnostic system. Very well thought out. You could tickle a few ports and get tons of ASCII info. At the time, they were trying to stuff more logic into the drive itself, simplifying the interface.

More recently was on a project that had an embedded text-to-speech engine. On a whim one day I thoght to ‘speak’ my status and errors instead of printf(). The results were predictable and too funny for words :)

Report comment

Reply
JGM says:

December 22, 2021 at 6:08 pm

Wrote a VBA (excel) solution for extracting/calculating between local and networked spreadsheets, so users could print out some management data. I knew all the users and none were experts. This was the error message I ended up using…

Public Sub Error()
Msg = “Excel is telling me there is a problem with what I am trying to do” & (Chr(13)) & (Chr(13)) & “The error was generated by: ” & Err.Source & (Chr(13)) & (Chr(13)) & “The error description is: ‘” & Err.Description & “‘” & (Chr(13)) & (Chr(13)) & “I cannot continue. Please press OK to exit” & (Chr(13)) & (Chr(13)) & “*** Please check that both the Tracker and the ‘” & (Is_Claim) & “‘ Spreadsheet are open ***” & (Chr(13)) & (Chr(13)) & “You can do this!”
Style = vbCritical + vbDefaultButton2
Title = “Overtime Extract – ERROR”
Response = MsgBox(Msg, Style, Title)
End Sub

Report comment

Reply
ksanger says:

December 23, 2021 at 5:46 am

My dad wrote the parts inventory and data base for Rochester Products. It was a division of GM that used to make mainly Carburetors. His philosophy was that “the user doesn’t know what he’s doing and will never learn.” I recall him telling me that his biggest issue was protecting the database from being corrupted. If the user entered a part incorrectly they would get a warning. After three tries the program would quit and inform the user to read the manual.

Today we have no manuals. But we do have obsolete YouTube videos showing us how earlier versions worked. Or we could purchase a book on the subject that was obsolete before it published.

Report comment

Reply
Chris Kiick says:

December 23, 2021 at 7:14 am

I’m a programmer, I understand how code works. But I’m not familiar with the internals of every program I use (even if the code is available). I HATE stupid error messages. An error message should a) tell you exactly what went wrong b) give you enough information to know what to do next and c) give tech support a fighting chance to solve the problem.
“ERROR: can’t open file.” Which file? Where? Why can’t it be opened? Is it opening for read or write? Which part of the code is opening the file? What’s supposed to be in the file?
Logging is a separate issue, but remember that most users don’t run their code with log level DEBUG or TRACE. And chances are re-running it with debug turned on is not going to reproduce the issue, if that’s even possible.
Even if the user is not technically sophisticated enough to know what to do, at least they can pass the error on to someone who does. Most of tech support are not the people who wrote the code either. They need informative error messages too.

Stupid error messages make the user feel helpless, the program look indifferent and tech support look incompetent. That’s just bad design.

Report comment

Reply
1. Bit Fiddler says:
  
  December 23, 2021 at 3:41 pm
  
  I’d agree that it’s bad user interface design. But often, the user interface is only a small fraction of the actual design. The rest might very well be extremely well designed, only to be let down by poor user design, or “user interface as afterthought”.
  
  Ergonomics extends to how the user mentally interacts with an item, device, or system. The guts of most systems depend on functional factors inherent in the task or device, but the human-machine interface is often treated as mere packaging rather than a first-class part of the whole system.
  
  “Can’t open file” is what happened. That alone doesn’t indicate poor design. In fact, except for the actual human-machine interface, unambiguity trumps understandability. It’s just that it’s not intended for raw human consumption, and many UI designers skimp and just vomit that low-level info at the user with little effort made to contextualize and communicate.
  
  Although, if the user truly is dependent on tech support, it’s much MUCH easier to pass an unambiguous opaque error code to tech support (where their support systems can interpret it and provide detail and nuance) rather than expecting the customer to easily communicate complicated nuances. And doing so has direct economic benefit for the seller of such a system.
  
  A properly designed error token(even if it’s opaque to the end user) is actually really good at surviving “the telephone game” intact. However, that’s useless if the customer support system isn’t actually designed to actually support actual customers.
  
  I was once peripherally involved in a project where it was necessary to support some rather complicated systems (with equivalently complicated errors, some of which were unavoidable due to the nature of the task) over the phone. During development, it became obvious that a customer would need to engage in question/answer troubleshooting with with the customer service rep for an average of 20 minutes per fault. The troubleshoting tree was moderately large but relatively simple; most of the time cost was due to the verbal communication. All of the necessary information was present before the customer made a call, and only the nature of the task prevented including a troubleshooting app built into the product. (turns out that a built-in app isn’t really useful to an engineer dangling upside down in a hazardous environment with only one hand free…)
  
  The result was that errors generatd a token that was as short as possible, easily communicated unambiguously over the phone, and easily handled by even the basic support staff. When entered into the support console, it expanded into a detailed and easily understandable breakdown of what happened, what caused it, what caused *that*, and a list of actions that would fix the problem or at least bisect the possibilities to narrow down its cause.
  
  The matching software for the customer support side was extremely simple (some versions could even run in a browser on a computer with no network access, the field-test-engineer version ran on an HP-48 just because the main engineer thought that was cool and it was easy to do). There was nothing at all proprietary in it, and everything it knew was also in the public documentation.
  
  A few years after I was no longer involved, someone grumbled about the quality of customer support on that product. Turns out that the new customer support contractors had not been given access to the tool (it wasn’t even mentioned to them), so they basically were stuck getting paid to be yelled at. Needless to say, the product rapidly lost its customer base despite being possibly the best actual solution in its niche…
  
  I
  
  Report comment
  
  Reply
  1. Matt Cramer says:
    
    December 28, 2021 at 7:29 am
    
    “Can’t open file” is a reasonable error message if it pops up when the user has selected a file and tried to open it; the user is aware that the program was trying to open a file and what file was being opened. Sure, it would be nice to know why the file didn’t open, but at least you know what file was being opened, and can investigate what happened to the file.
    
    The same message would be downright inexcusable if it popped up when a program had been sitting idle for several minutes and did not have any files open that the user was aware of. In that case, troubleshooting at least requires knowing what file the program wanted to open; adding the file name and path to the error message would be a massive improvement.
    
    Report comment
    
    Reply
DougM says:

December 23, 2021 at 2:23 pm

Let me try that again (I’m astonished that this UI won’t let me use GT or LT symbols in a message!):

I don’t think it’s working, I’m astonished every day. I think that every error message should have 2 buttons, OK and NOT OK and If you NOT OK hit the developer should be required to come to your house, fix it and explain to you why he or she was so stupid in the first place.

Admittedly that would be even more astonishing.

Report comment

Reply
PPJ says:

December 26, 2021 at 4:07 pm

“Case in point: My wife’s iPhone wouldn’t upload pictures”

I couldn’t send email on my iPhone and it returned error with smtp server connection. After half hour of checking smtp settings on emailclient I realised it was attachement that was too big.

Report comment

Reply

Hackaday

Error Codes And The Law Of Least Astonishment

73 thoughts on “Error Codes And The Law Of Least Astonishment”

Leave a Reply to ComediclesCancel reply

Search

Never miss a hack

If you missed it

NPAPI And The Hot-Pluggable World Wide Web

The Time Clock Has Stood The Test Of Time

The Rise And Fall Of The In-Car Fax Machines

How Advanced Autopilots Make Airplanes Safer When Humans Go AWOL

2025: As The Hardware World Turns

Our Columns

For The Fun Of It

Fighting Food Poisoning With A Patch

Hackaday Podcast Episode 352: Visualizing Sound, And Windows 11 Is A Dog

How Do PAL And NTSC Really Work?

Linux Fu: Yet Another Shell Script Trick

73 thoughts on “Error Codes And The Law Of Least Astonishment”

Leave a Reply to ComediclesCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns