Fail Of The Week: Roboracer Meets Wall

November 8, 2020

There comes a moment when our project sees the light of day, publicly presented to people who are curious to see the results of all our hard work, only for it to fail in a spectacularly embarrassing way. This is the dreaded “Demo Curse” and it recently befell the SIT Acronis Autonomous team. Their Roborace car gained social media infamy as it was seen launching off the starting line and immediately into a wall. A team member explained what happened.

A few explanations had started circulating, but only in the vague terms of a “steering lock” without much technical detail until this emerged. Steering lock? You mean like The Club? Well, sort of. While there was no steering wheel immobilization steel bar on the car, a software equivalent did take hold within the car’s systems. During initialization, while a human driver was at the controls, one of the modules sent out NaN (Not a Number) instead of a valid numeric value. This was never seen in testing, and it wreaked havoc at the worst possible time.

A module whose job was to ensure numbers stay within expected bounds said “not a number, not my problem!” That NaN value propagated through to the vehicle’s CAN data bus, which didn’t define the handling of NaN so it was arbitrarily translated into a very large number causing further problems. This cascade of events resulted in a steering control system locked to full right before the algorithm was given permission to start driving. It desperately tried to steer the car back on course, without effect, for the few short seconds until it met the wall.

While embarrassing and not the kind of publicity the Schaffhausen Institute of Technology or their sponsor Acronis was hoping for, the team dug through logs to understand what happened and taught their car to handle NaN properly. Driving a backup car, round two went very well and the team took second place. So they had a happy ending after all. Congratulations! We’re very happy this problem was found and fixed on a closed track and not on public roads.

[via Engadget]

52 thoughts on “Fail Of The Week: Roboracer Meets Wall”

Anton Fosselius says:

November 8, 2020 at 1:08 pm

now this is where ADA shines ^_^ so strict/hard typed you cant do a thing…

Report comment

Reply
1. 8bitwiz says:
  
  November 8, 2020 at 1:40 pm
  
  And it worked so well for Ariaine 5 too!
  
  NaN NaN NaN NaN, hey hey hey, se-egfault!
  At least these guys had a scratch monkey ready!
  
  Report comment
  
  Reply
Andrew says:

November 8, 2020 at 1:25 pm

They just have to build 649 more cars, then keep the ones that travelled farthest:

https://hackaday.com/2020/11/07/training-a-neural-network-to-play-a-driving-game/

Report comment

Reply
Fred says:

November 8, 2020 at 2:10 pm

And this is precisely the reason that self driving cars should never be allowed on public roads.

A coding error buried so deep that it is not until a crash into a wall (or a dead pedestrian) that it is even realised there is a problem. Law of averages says you won’t be that dead pedestrian. So sorry you were.

Report comment

Reply
1. Road Rachel says:
  
  November 8, 2020 at 2:44 pm
  
  A self driving car only needs to outperform an average human to be the better option.
  
  Report comment
  
  Reply
  1. John Smith says:
    
    November 8, 2020 at 11:56 pm
    
    If I outperform average human more than self driving car, then its not better option for me for sure.
    
    Report comment
    
    Reply
    1. Segdop says:
      
      November 9, 2020 at 5:27 am
      
      Glad you understand how averages work.
      
      Report comment
      
      Reply
      1. Ren says:
        
        November 9, 2020 at 9:49 am
        
        Because 90% of drivers believe they are better than average!
        B^)
        
        Report comment
      2. Dude says:
        
        November 9, 2020 at 11:09 am
        
        70-90% of drivers ARE better than the average.
        
        The trick is that most drivers never have accidents, while few have many.
        
        Report comment
  2. Dude says:
    
    November 9, 2020 at 11:14 am
    
    > only needs to outperform an average human to be the better option
    
    That would make it worse for at least half the drivers in any case.
    
    Report comment
    
    Reply
2. Bernie M says:
  
  November 8, 2020 at 2:57 pm
  
  And that brings up the standard ethics question of whether that “one” dead pedestrian is “worth it” if finding that coding error enables self-driving cars to improve. It’s not hard to imagine that with such constant improvements, almost all such deadly incidents will soon be eliminated. Following that, the self-driving cars have many fewer such incidents than with human drivers. (Obviously, those 10,000 *annual* deaths due to drunk driving are all but completely eliminated.)
  
  But it IS an ethics question of whether it’s appropriate to pursue this end. :-)
  
  Report comment
  
  Reply
  1. Miroslav says:
    
    November 8, 2020 at 5:24 pm
    
    Our software has been “constantly improving” for 70 years now, and it’s still full of errors.
    
    I say every creator of a self driving car must sit in his creation and be driven around by it for 1 year.
    
    Those who survived will be few.
    
    Report comment
    
    Reply
    1. Bernie M says:
      
      November 8, 2020 at 5:54 pm
      
      I have a couple of responses to this:
      
      1) “Our software has been “constantly improving” for 70 years now, and it’s still full of errors.”
      Sorry – I don’t think there’s actually any software from 1950 that’s still in use and being actively debugged. If you’re implying that new software IN GENERAL is always being developed and debugged, this is true, but this is NEW software that might have bugs.
      
      Also, consider that the “self-driving code” doesn’t have to be 100% bug-free to be essentially a flawless driver. If the software makes a decision that causes the car to begin to accelerate, but then, 10 milliseconds later realizes it made a mistake (for whatever reason) and then begins to apply breaks then there very likely has been no real-world harm. Meanwhile, the software will flag that “exceptional incident” to go back to the development group for review and appropriate remediation. And thus, a true bug, causes no harm, and gets fixed anyway.
      
      2) “I say every creator of a self driving car must sit in his creation and be driven around by it for 1 year.
      Those who survived will be few.”
      
      Interesting. I wonder if the same logic should compel every single driving instructor to be driven around for a year by his/her students.
      
      As to the “Those who survived will be few” comment, I suspect you don’t have intimate knowledge of the state of affairs of self-driving. There are a number of companies (at least 4 that I’m familiar with) that have self-driving code that works exceptionally well.
      
      And, to address the hyperbole, the *vast* majority of car accidents do not result in death or even severe injury. :-)
      
      Report comment
      
      Reply
      1. victroniko says:
        
        November 8, 2020 at 7:48 pm
        
        By “our software” he was probably referring to our squishy wetware up there…
        
        (BTW, “I don’t think there’s actually any software from 1950 that’s still in use and being actively debugged” — tell that to the COBOL folks from the banking industry :-P)
        
        Report comment
      2. Truth says:
        
        November 8, 2020 at 9:16 pm
        
        @victroniko
        Some interesting COBOL was found during the search for Y2K bugs in an unnamed English bank, that was using pounds, shillings and pence at the fundamental units in their system and that code was patched in 1971, when the UK decimalised to handle the money.
        (FYI: There is 12 pence in a shilling, 20 shillings or 240 pence in a pound).
        
        So oddball code can hang around for nearly 30 years.
        
        Report comment
      3. ian 42 says:
        
        October 26, 2021 at 3:52 am
        
        to answer the comments above – I’ve worked on software that was written in the 60’s (COBOL), only a couple of years ago. ie it was over 50 years old…
        
        But the bigger problem is 100% of drivers think they are above average.
        
        Report comment
  2. Arne says:
    
    November 8, 2020 at 9:27 pm
    
    Apply the same thinking to human driving and the only legal speed would be very slow!
    
    Report comment
    
    Reply
3. Ag says:
  
  November 8, 2020 at 2:58 pm
  
  But humans literally drop dead or lose conscious at the wheel and that’s just a comparable fault.
  
  Humans also, fall asleep, drink, do dangerous maneuvers for fun etc
  
  Report comment
  
  Reply
  1. pelrun says:
    
    November 8, 2020 at 9:48 pm
    
    At least in the software case you can patch out the bug across the entire fleet once it’s discovered. Humans just keep failing in the same damn ways and nobody does anything about it.
    
    Report comment
    
    Reply
4. Ameyring says:
  
  November 8, 2020 at 3:35 pm
  
  Self-driving cars would make more sense on dedicated roads with excellent pavement markings. I’m not comfortable in the city. I’m sure time will lead to improvements and if it beats avoiding deaths due to drunk driving, I’m for it.
  
  Report comment
  
  Reply
5. Ben says:
  
  November 8, 2020 at 4:35 pm
  
  You sound like the most boring person on the internet.
  
  Report comment
  
  Reply
6. Fred says:
  
  November 8, 2020 at 5:13 pm
  
  I agree that a self-driving vehicle will (in the future sometime) outperform a human, but who takes liability?
  Is it the driver, who had nothing to do with writing the code?
  Is it the manufacturer, who isn’t able to control the fact mud splashed up onto the sensor?
  Is the the software engineer who signed off on the code, but has since moved on (or been run over by his own ‘approved’ code!)?
  
  I think it is unrealistic to expect the the driver “must always be *fully* ready to take over” (as all autopilot car manufacturers are now adopting) when the autopilot throws a fit.
  What is the point of having a self-driving car?
  
  Well, the reason is legal liability: to shift the blame from the manufacturer/designer to someone else.
  
  Report comment
  
  Reply
  1. pelrun says:
    
    November 8, 2020 at 9:50 pm
    
    Perhaps finding someone to blame stops being such a concern when accidents become rare and far between.
    
    Report comment
    
    Reply
    1. fizzymagic says:
      
      November 9, 2020 at 12:05 am
      
      How has that worked out for safer forms of travel, like air and train? Nope. Liability is important and the software industry, which is protected from it, makes lousy products as a result.
      
      Report comment
      
      Reply
  2. CampGareth says:
    
    November 9, 2020 at 12:38 am
    
    Whenever you own and drive a car you are responsible for the risk that poses, that’s the default position. This changes when the manufacturer takes control away from you without your consent, E.g. A software bug occurs and the brake pedal is now an accelerator. Self driving and cruise control however require your consent to activate so you are still the one in control and therefore the responsibility lies with you. This will be the case until either one of those features doesn’t deactivate when you try to regain control or human-facing controls disappear altogether which I could only see being a thing on taxi cabs.
    
    Report comment
    
    Reply
    1. Dude says:
      
      November 9, 2020 at 11:19 am
      
      > so you are still the one in control and therefore the responsibility lies with you
      
      Automatic driving may become mandatory on the point that it’s “safer than average”, which is forcing the average risk onto everybody. Then the responsibility cannot be on the driver anymore, because they had no choice.
      
      Report comment
      
      Reply
    2. mac012345 says:
      
      November 10, 2020 at 1:08 am
      
      > therefore the responsibility lies with you.
      
      Nope, already settled in court (volvo).
      In this case Volvo cannot prove that a mandatory recall due to brake failure was done on a car involved in a fatality (driver lost control of car and killed one children).
      Note that driver was also responsible as she didn’t take measure to avoid the group of children (which was not on the street btw).
      
      Report comment
      
      Reply
7. Matt says:
  
  November 8, 2020 at 7:33 pm
  
  I’d recommend conversing with some engineers in the automotive industry about what they need to do to prove safety for vehicles that will be on the public roads.
  
  Report comment
  
  Reply
8. Stanson says:
  
  November 8, 2020 at 9:34 pm
  
  > A coding error buried so deep that it is not until a crash into a wall (or a dead pedestrian) that it is even realised there is a problem.
  
  That’s why automotive code must be as simple as possible. It also must be completely provable. There must be no place where coding error could be “buried”. It must be fully readeable, understandable and analyzable by single person in a day. So, no any neural networks or other unpredictable and unstable “modern” crap. If your code, including used libraries, takes more that few thousand lines or need some “new, innovative language”, it should not be run in a crucial module of a car. Never. There is entertainment system for different bloatware, that should not be connected to vehicle network.
  
  Our ancestors from both continents send spaceships to the moon and back by means of 40kloc or less, don’t tell me that shitty cars need more. If they are – you definitely doing something completely wrong.
  
  Report comment
  
  Reply
  1. Miroslav says:
    
    November 9, 2020 at 7:41 am
    
    100% true. In my view, neural network is just a fancy name for inscrutable and non-checkable statistical processor.
    
    Report comment
    
    Reply
  2. Matt says:
    
    November 10, 2020 at 10:27 am
    
    I’m actually employed by a company working on the autonomous vehicle problem in a role where we are developing and proving the safety of our software and hardware. My work is all at the foundational layers of the software so I can’t say much about the safety process for the DNNs.
    
    We are implementing an ISO26262 process which covers a lot of ground for how to write software. If you just want to talk about code complexity, that is covered. But a simple metric like lines of code is a terrible one for ascertaining complexity. ISO26262 recommends adopting a metric like cyclomatic code complexity (CCM). Combined with code coverage requirements (really only realistically done via a combination of unit, integration, and system testing) and security requirements (that generally involve techniques such as fuzzing) the result is something that is very punishing unless the code is low complexity and fairly easy to understand.
    
    The DNN problem is being looked at very seriously as well. We understand that testing alone is the the best way (even years of it done with both simulators and on the road). Because I’m no expert, and because of how much of this is still proprietary research I can’t comment. Every company in this space is highly aware of the failures of Tesla and Uber and no one is eager to add their name to that list.
    But the public should absolutely demand proof of safety and to examine the methodology behind those claims of safety when it comes to the DNNs. The other stuff in these cars is all know quantities at this point and there are decades or practices to pull from.
    
    Report comment
    
    Reply
9. john says:
  
  November 8, 2020 at 10:59 pm
  
  That exact bug will only ever happen once, while humanity has been combating drink-driving for a very long time without ever stamping it out.
  
  Report comment
  
  Reply
Hirudinea says:

November 8, 2020 at 2:17 pm

This isn’t such an advance, humans have been doing the same thing for years.

Report comment

Reply
1. Adam says:
  
  November 10, 2020 at 9:01 am
  
  We shouldn’t allow self-driving cars onto the roads until they can steer into walls at least as precisely as the average human.
  
  Report comment
  
  Reply
Mike says:

November 8, 2020 at 3:31 pm

If you read the article. This is exactly why you DONT put low end coders behind something of this magnitude. So many coders these days only focus on the getting from point A to point B. Where where they should put equal if not more effort is in the failure points. Uh, if i want to steel left and i dont get the feed back that we are then stop. How simple is that?
Thus the reason why you can see these people on the software teem suck.

Report comment

Reply
1. jpa says:
  
  November 8, 2020 at 9:32 pm
  
  They learned their lesson in a safe situation and also got a lot of publicity for this kind of a problem. What’s the problem?
  
  Report comment
  
  Reply
2. pelrun says:
  
  November 8, 2020 at 9:55 pm
  
  Oh piss off. Nobody writes defect free code. Not me, not these guys, and certainly not you. Crashes are rarely this spectacular but either way it doesn’t deserve some armchair “engineer” making bullshit generalisations about how everyone else but them is stupid.
  
  The fact that they could track down the full failure path from defect to cause shows they know what they are doing.
  
  Report comment
  
  Reply
  1. Dude says:
    
    November 9, 2020 at 11:21 am
    
    At least now they do.
    
    Report comment
    
    Reply
  2. Adam says:
    
    November 10, 2020 at 9:03 am
    
    +1!
    
    https://qntm.org/core
    
    Report comment
    
    Reply
RandyKC says:

November 8, 2020 at 3:47 pm

You had me at CAN bus.
It’s not a failure when you learn.

Report comment

Reply
Matt says:

November 8, 2020 at 7:39 pm

I wonder if the team used Coverity or similar tool to check if the code complied with the applicable MISRA, AutoSAR, and CERT specifications. I would also wonder if they applied an ASIL process to their work. At ASIL-B they should at least have had range checks on their interfaces that would have caught this.
Yes, all these things come with a lot of overhead and being a race car, strictly adhering to best practices for the automotive industry probably seemed too onerous. But these things exist for reasons like this.

Report comment

Reply
1. 8bitwiz says:
  
  November 9, 2020 at 4:27 am
  
  Such code-checking tools neither verify algorithms, nor problems using floating point. And if you read the summary, the problem was in range-checking code!
  
  I think I’m more concerned that they didn’t have an E-stop button in their design. Yes, I know it would have to be wireless, with all the extra problems that brings. And the human would have to have been fast enough to push it in time! But this is why I, as a human driver, just ease off the brakes, or put on the gas only a little bit, and verify that the car is going in the right direction before stepping on it. Sometimes it doesn’t go the way you expect, which is why you need to confirm the feedback.
  
  Report comment
  
  Reply
  1. mac012345 says:
    
    November 10, 2020 at 1:14 am
    
    It’s not really in range checking code, in fact it is really close from Ariane5 failure: one system goes out-of-limits (NaN or overflow), then other systems tied to it do not handle well the issue because it wasn’t explicitly specified.
    
    This is basically a system architect failure, and a bad one.
    
    Report comment
    
    Reply
    1. Matt says:
      
      November 10, 2020 at 10:40 am
      
      “This is basically a system architect failure, and a bad one.”
      One that if missed in my job would get the team doing the FMEAs in a lot of hot water. Anyone doing an FTA or DFA would also catch a lot of flack.
      Because this is a race situation, I doubt the team is following automotive safety processes though.
      
      And as I state in my reply to 8bitwiz, this failure sounds like a blatant violation of CERT FLP04-C. If they were using Coverity with checkers for the CERT C rule set, this should have been flagged and the code should have at least been reviewed.
      
      Report comment
      
      Reply
  2. Matt says:
    
    November 10, 2020 at 10:35 am
    
    https://wiki.sei.cmu.edu/confluence/display/c/FLP32-C.+Prevent+or+detect+domain+and+range+errors+in+math+functions
    
    https://wiki.sei.cmu.edu/confluence/display/c/FLP04-C.+Check+floating-point+inputs+for+exceptional+values
    
    https://community.synopsys.com/s/article/Coverity-2017-07-FP-MISRA-C-2012-Rule-17-3
    
    There are indeed rules and checkers around this sort of issue.
    From the post “So during this intialization lap something happened which apparently caused the steering control signal to go to NaN and subsequently the steering locked to the maximum value to the right.”
    
    They allowed the NaN to propagate though multiple interfaces in the system.
    At the very least CERT FLP04-C was violated somewhere.
    
    Report comment
    
    Reply
  3. Matt says:
    
    November 10, 2020 at 1:25 pm
    
    Looks like my reply didn’t make it through the spam filter because I had a number of URLs in it. There are a number of MISRA and CERT rules around floating point numbers an NaN. Coverity advertises support for them. We use Coverity at my workplace and what we have observed is that Coverity doesn’t miss things, but it can make a lot of false positives.
    
    One specific CERT C rule FLP04-C specifically says to check for NaN. The post says the NaN was propagated through software interfaces in the car and is sounds like none of them checked of this. If they were targeting even ASIL A they should have been checking their data at their interfaces.
    What I find extra ironic is that this is a team from Acronis which is a cybersecurity company and they should be aware of the CERT coding guidelines.
    
    You can see some of the measure required for the different ASIL levels if you search for “Analysis of ISO26262 standard application in development of steer-by-wire systems”. The first link should be a PDF with that tile. In that PDF, look for table 6.
    
    Had the team followed ISO 26262 and aimed for even the lowest ASIL level of A then the error would almost certainly have been caught.
    
    However, being a race car without humans around on a closed course with safety features to keep the cars separated from the people, perhaps there are no mandated safety certifications teams must meet.
    
    Report comment
    
    Reply
Truth says:

November 8, 2020 at 9:28 pm

The simple solution is machine only roads, if a machine destroys a machine, the people inside the cars/lorries/busses knew the risks and were willing to hand off their safety to what are in effect advanced data harvesting company. That way the company is innocent if people die during the testing phase. And every employee of all these companies must legally be on these machine only roads in their companies cars for at least 7 hours a week.

Problems will be rapidly fixed.

Report comment

Reply
Floatsarebadmmkay says:

November 8, 2020 at 11:25 pm

You don’t just use floats (and doubles) in critical apps without proper caution.

Report comment

Reply
1. Adam says:
  
  November 10, 2020 at 9:05 am
  
  “…now I have 2.000000000000003 problems.”
  
  Report comment
  
  Reply
Brian says:

November 9, 2020 at 6:18 am

I completely understand the software bug BUT… was there no collision avoidance algorithm that could actuate the brakes??

Like, ok, the steering algorithm failed. But I would think at some point a lidar sensor would be like “Hey, a wall” and the control algorithm would be like “Hmmm, brakes I guess”.

Perhaps the algorithm was assuming the car should easily be able to steer away from the obstacle but you can’t always rely on steering. Cars can understeer and oversteer. The other systems should be prepared to compensate.

Report comment

Reply
DainBramage says:

November 9, 2020 at 7:21 am

Why was this ever allowed to happen? What kind of programmer doesn’t design his or her input to be tolerant of a simple NAN error?
Sheesh, I figured out how to do that in TI-99/4A BASIC when I was just a young kid. If a stupid kid like me could figure it out on his own, surely a group of adults with degrees should be able to manage it!

Report comment

Reply
Bob says:

November 10, 2020 at 2:52 am

The steering got stuck, that’s fine, it could happen mechanically as well. It didn’t look like that was the only problem though.
Does it not have any means to detect an object in front of the car and perhaps take multiple actions? If this had been a human driver and let’s say the steering wheel somehow detached from the wheels, there’s a pretty good chance that they would have thought, ooh there’s a thing coming towards me unexpectedly, but I’m steering away from it, can I perhaps press quite hard on the brake and see what happens instead of trying to accelerate and just steer around it?

Report comment

Reply

Hackaday

Fail Of The Week: Roboracer Meets Wall

52 thoughts on “Fail Of The Week: Roboracer Meets Wall”

Leave a Reply to StansonCancel reply

Search

Never miss a hack

If you missed it

Falling Down The Land Camera Rabbit Hole

RTEMS Statement Deepens Libogc License Controversy

Version Control To The Max

Trackside Observations Of A Rail Power Enthusiast

Radio Apocalypse: Meteor Burst Communications

Our Columns

ChatGPT & Me. ChatGPT Is Me!

Hackaday Podcast Episode 321: Learn You Some 3DP, Let The Wookie Win, Or Design A Thinkpad Motherboard Anew

This Week In Security: Lingering Spectre, Deep Fakes, And CoreAudio

Remembering More Memory: XMS And A Real Hack

Remembering Memory: EMS, And TSRs

52 thoughts on “Fail Of The Week: Roboracer Meets Wall”

Leave a Reply to StansonCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns