Self-Driving Cars And The Fight Over The Necessity Of Lidar

If you haven’t lived underneath a rock for the past decade or so, you will have seen a lot of arguing in the media by prominent figures and their respective fanbases about what the right sensor package is for autonomous vehicles, or ‘self-driving cars’ in popular parlance. As the task here is to effectively replicate what is achieved by the human Mark 1 eyeball and associated processing hardware in the evolutionary layers of patched-together wetware (‘human brain’), it might seem tempting to think that a bunch of modern RGB cameras and a zippy computer system could do the same vision task quite easily.

This is where reality throws a couple of curveballs. Although RGB cameras lack the evolutionary glitches like an inverted image sensor and a big dead spot where the optical nerve punches through said sensor layer, it turns out that the preprocessing performed in the retina, the processing in the visual cortex and analysis in the rest of the brain is really quite good at detecting objects, no doubt helped by millions of years of only those who managed to not get eaten by predators procreating in significant numbers.

Hence the solution of sticking something like a Lidar scanner on a car makes a lot of sense. Not only does this provide advanced details on one’s surroundings, but also isn’t bothered by rain and fog the way an RGB camera is. Having more and better quality information makes subsequent processing easier and more effective, or so it would seem.

Computer Vision Things

A Waymo Jaguar I-Pace car in San Francisco. (Credit: Dllu, Wikimedia)
A Waymo Jaguar I-Pace car in San Francisco. (Credit: Dllu, Wikimedia)

Giving machines the ability to see and recognize objects has been a dream for many decades, and the subject of nearly an infinite number of science-fiction works. For us humans this ability is developed over the course of our development from a newborn with a still developing visual cortex, to a young adult who by then has hopefully learned how to identify objects in their environment, including details like which objects are edible and which are not.

As it turns out, just the first part of that challenge is pretty hard, with interpreting a scene as captured by a camera subject to many possible algorithms that seek to extract edges, infer connections based on various hints as well as the distance to said object and whether it’s moving or not. All just to answer the basic question of which objects exist in a scene, and what they are currently doing.

Approaches to object detection can be subdivided into conventional and neural network approaches, with methods employing convolutional neural networks (CNNs) being the most prevalent these days. These CNNs are typically trained with a dataset that is relevant to the objects that will be encountered, such as while navigating in traffic. This is what is used for autonomous cars today by companies like Waymo and Tesla, and is why they need to have both access to a large dataset of traffic videos to train with, as well as a large collection of employees who  watch said videos in order to tag as many objects as possible. Once tagged and bundled, these videos then become CNN training data sets.

This raises the question of how accurate this approach is. With purely RGB camera images as input, the answer appears to be ‘sorta’. Although only considered to be a Class 2 autonomous system according to the SAE’s 0-5 rating system, Tesla vehicles with the Autopilot system installed failed to recognize hazards on multiple occasions, including the side of a white truck in 2016, a concrete barrier between a highway and an offramp in 2018, running a red light and rear-ending a fire truck in 2019.

This pattern continues year after year, with the Autopilot system failing to recognize hazards and engaging the brakes, including in so-called ‘Full-Self Driving’ (FSD) mode. In April of 2024, a motorcyclist was run over by a Tesla in FSD mode when the system failed to stop, but instead accelerated. This made it the second fatality involving FSD mode, with the mode now being called ‘FSD Supervised’.

Compared to the considerably less crash-prone Level 4 Waymo cars with their hard to miss sensor packages strapped to the car, one could conceivably make the case that perhaps just a couple of RGB cameras is not enough for reliable object detection, and that quite possibly blending of sensors is a more reliable method for object detection.

Which is not to say that Waymo cars are perfect, of course. In 2024 one Waymo car managed to hit a utility pole at low speeds during a pullover maneuver, when the car’s firmware incorrectly assessed its response to a situation where a ‘pole-like object’ was present, but without a hard edge between said pole and the road.

This gets us to the second issue with self-driving cars: taking the right decision when confronted with a new situation.

Acting On Perception

The Tesla Hardware 4 mainboard with its redundant custom SoCs. (Credit: Autopilotreview.com)
The Tesla Hardware 4 mainboard with its redundant custom SoCs. (Source: Autopilotreview.com)

Once you know what objects are in a scene, and merge this with the known state of the vehicle and, the next step for an autonomous vehicle is to decide what to do with this information. Although the tempting answer might be to also use ‘something with neural networks’ here, this has turned out to be a non-viable method. Back in 2018 Waymo created a recursive neural network (RNN) called ChauffeurNet which was trained on both real-life and synthetic driving data to have it effectively imitate human drivers.

The conclusion of this experiment was that while deep learning has a place here, you need to lean mostly on a solid body of rules that provides it with explicit reasoning that copes better with what is called the ‘long tail’ of possible situations, as you cannot put every conceivable situation in a data set.

This thus again turns out to be a place where human input and intelligence are required, as while an RNN or similar can be trained on an impressive data set, it will never be able to learn the reasons for why a decision was made in a training video, nor provide its own reasoning and make reasonable adaptations when faced with a new situation. This is where human experts have to define explicit rules, taking into account the known facts about the current surroundings and state of the vehicle.

Here is where having details like explicit distance information to an obstacle, its relative speed and dimensions, as well as room to divert to prevent a crash are not just nice to have. Adding sensors like radar and Lidar can provide solid data that an RGB camera plus CNN may also provide if you’re lucky, but also maybe not quite. When you’re talking about highway speeds and potentially the lives of multiple people at risk, certainty always wins out.

Tesla Hardware And Sneaky Radars

Arbe Phoenix radar module installed in a Tesla car as part of the Hardware 4 Autopilot hardware. (Credit: @greentheonly, Twitter)
Arbe Phoenix radar module installed in a Tesla car as part of the Hardware 4 Autopilot hardware. (Credit: @greentheonly, Twitter)

One of the poorly kept secrets about Tesla’s Autopilot system is that it’s had a front-facing radar sensor for most of the time. Starting with Hardware 1 (HW1), it featured a single front-facing camera behind the top of the windshield and a radar behind the lower grille, in addition to 12 ultrasonic sensors around the vehicle.

Notable is that Tesla did not initially use the radar in a primary object detection role here, meaning that object detection and emergency stop functionality was performed using the RGB cameras. This changed after the RGB camera system failed to notice a white trailer against a bright sky, resulting in a spectacular crash. The subsequent firmware update gave the radar system the same role as the camera system, which likely would have prevented that particular crash.

HW1 used Mobileye’s EyeQ3, but after Mobileye cut ties with Tesla, NVidia’s Drive PX 2 was used instead for HW2. This upped the number of cameras to eight, providing a surround view of the car’s surroundings, with a similar forward-facing radar. After an intermedia HW2.5 revision, HW3 was the first to use a custom processor, featuring twelve Arm Cortex-A72 cores clocked at 2.6 GHz.

HW3 initially also had a radar sensor, but in 2021 this was eliminated with the ‘Tesla Vision’ system, which resulted in a significant uptick in crashes. In 2022 it was announced that the ultrasonic sensors for short-range object detection would be removed as well.

Then in January of 2023 HW4 started shipping, with even more impressive computing specs and 5 MP cameras instead of the previous 1.2 MP ones. This revision also reintroduced the forward-facing radar, apparently the Arbe Phoenix radar with a 300 meter range, but not in the Model Y. This indicates that RGB camera-only perception is still the primary mode for Tesla cars.

Answering The Question

At this point we can say with a high degree of certainty that by just using RGB cameras it is exceedingly hard to reliably stop a vehicle from smashing into objects, for the simple reason that you are reducing the amount of reliable data that goes into your decision-making software. While the object-detecting CNN may give a 29% possibility of an object being right up ahead, the radar or Lidar will have told you that a big, rather solid-looking object is lying on the road. Your own eyes would have told you that it’s a large piece of concrete that fell off a truck in front of you.

This then mostly leaves the question of whether the front-facing radar that’s present in at least some Tesla cars is about as good as the Lidar contraption that’s used by other car manufacturers like Volvo, as well as the roof-sized version by Waymo. After all, both work according to roughly the same basic principles.

That said, Lidar is superior when it comes to aspects like accuracy, as radar uses longer wavelengths. At the same time a radar system isn’t bothered as much by weather conditions, while generally being cheaper. For Waymo the choice for Lidar over radar comes down to this improved detail, as they can create a detailed 3D image of the surroundings, down to the direction that a pedestrian is facing, and hand signals by cyclists.

Thus the shortest possible answer is that yes, Lidar is absolutely the best option, while radar is a pretty good option to at least not drive into that semitrailer and/or pedestrian. Assuming your firmware is properly configured to act on said object detection, natch.

15 thoughts on “Self-Driving Cars And The Fight Over The Necessity Of Lidar

    1. I am reminded of that self-driving car in San Francisco that ran someone over and got them wedged in the wheel well. It didn’t have sound or haptic sensors, and the wheel well was in a blind spot, so it had no way of knowing it had picked up a passenger.

      If I recall correctly, it went something like a quarter mile before pulling over because it thought it had a flat tire.

    2. We drive by the seat of our pants, for sure. And there’s always “does that tire sound funny to you” which is most often just a rock or something that got jammed between the ridges, but could be an early warning.

      That one time I smelled the cabin filling with smoke…well a self-driving system probably wouldn’t have made it much worse anyway.

      But that’s a super interesting point about self-drivers in general, that they are designed for the “normal” situations, and may not react well when things get far enough out of the box — on the edge of traction over a sketchy gravel road, or with actual problems.

      1. And there’s always “does that tire sound funny to you” which is most often just a rock or something that got jammed between the ridges, but could be an early warning.

        I once heard a very slight metal tinging/clinking from the back while driving a relatives car.
        Had to bring it in for a regular checkup I think (relatives were on holiday) and mentioned that noise to the shop.
        Turns out the tip of one of the rear suspensions springs/dampeners(?) had broken off and pretty much turned the car road unsafe (not in USA I assume).

        Of course assuming the car shop wasn’t grifting my relatives (I’m not a 100% they weren’t).

        1. When you get a pilot’s certificate they check your vision and hearing. A friend of mine has a VERY nice airplane, a Cessna 210, and is significantly older. As we were taking off from his runway, I heard just the faintest brrip brrip brrip kinda sound as we accelerated and it stopped almost the moment we came off the ground, so I thought, huh, something’s going on with one of the tires. He didn’t hear a thing. When we landed at the next airport, it was a little louder. I mentioned it and we looked. Smaller aircraft have bolt-together wheels, two halves with (in this case) eight bolts holding them together clamped around the tire and tube. Four of the bolts had failed and the halves were slightly apart on one side, so the tire was wider and slightly touching the axle yoke, and the sound I heard was it brushing as the tire accelerated.
          Sounds like that matter. Or like when you hear your tire pick up a nail and it’s going tink tink tink and you think ah I’m about to get a flat.

  1. What I want to know is how does this work in, for example, a canadiwn winter? The road isn’t visible, there are no lines because they are under a foot of snow. Lidar won’t work in the snow I think, and the road surface changes from hard pack to powder to black ice without warning. How will the car know if it can get up a hill or if it’s going to slide down backwards?

    No matter how good ai gets I don’t think it will ever be ready for real winters..

    1. The road conditions you describe (heavy snowfall, sufficient accumulation to hide even the presence of a road, frequent black ice) are too dangerous for any vehicle less substantial than a snowplow to navigate, regardless of what’s behind the wheel.

      Yes, I’m sure many of the rugged northern tough guys in the audience do it all the time. That doesn’t make it safe.

      1. So, you are suggesting we stay home for four months of the year? I’d gladly do that, but that’s not an option. And, yes. You are a wuss. For real. Snow and ice isn’t any more dangerous than you make it.

      2. In this part of the world at 2500 meters altitude school bus drivers do that 5 days a week from November to April. They even have self-deploying chains for the tires. The rest of us plebes make do with studded winter tires.

      3. And yet thousands of drivers can manage that safely. Including the snowfall, which if I’m not mistaken OP didn’t include in their description.
        I don’t know how you assume a snow plow would manage that more safely? They sure could be better if the road top snow was over a feet but again that’s not mentioned in the OP.
        Here the side of the roads are marked with roadside poles for normal cars and snowplows alike, I assume Canadian roads are the same?

    2. The human mind is a really, really good signal processor. Think about the times that you have driven in a rain that overwhelmed your (streaky) windshield wipers, fog (both outside and fogged up windows), and driving snow – particularly at night. The fact that you are here shows that the mind is really good at inferring missing visual information from the little good info that it may have available.

  2. Proponents of the cameras-only approach like to point out that we humans have been driving cars with just two front-mounted cameras for decades.

    Brother, if I could have innate panoramic knowledge of the 3D environment around my car, do you think I wouldn’t want that?

  3. It takes years to master effective driving and many drivers never really achieve top level skills. Expecting such from a modern system is a bit ambitious. I feel there is still a long way to go before we can trust these systems.

    I’m reminded of training my grandson when he first started driving. Sure he could see the road, the signs, and the traffic. I was often pointing out various things like a car approaching a stop sign on a side street, a stale green light, a car slowing down while approaching a Starbucks up ahead. Many such details feed into expected situations and possible driving adjustments to accommodate what is happening around us. The ever evolving situation around driving requires a high degree of pattern recognition and situational awareness. Edge detection and range determination are the barest levels of understanding, and a higher resolution of a blocky perception is not really an improvement. A much higher level of comprehension around the driving experience is required for truly autonomous driving.

    Take the example of modern adaptive cruise control systems. They detect a slower vehicle in front of you and slow your speed so you don’t run into them. But an alert driver would recognize the slower vehicle and change lanes to avoid the slower car. But what about the car approaching from the rear, is it going faster? Can you wait for that car to pass then change lanes? Do you speed up and move around the slower car to minimize slowing the faster traffic?

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.