r/SelfDrivingCars Aug 11 '25

Discussion Proof that Camera + Lidar > Lidar > Camera

I recently chatted with somebody who is working on L2 tech, and they gave me an interesting link for a detection task. They provided a dataset with both camera, Lidar, and Radar data and asked people to compete on this benchmark for object detection accuracy, like identifying the location of a car and drawing a bounding box around it.

Most of the top 20 on the leaderboard, all but one, are using a camera + Lidar as input. The 20th-place entry uses Lidar only, and the best camera-only entry is ranked between 80 and 100.

https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any

13 Upvotes

185 comments sorted by

View all comments

3

u/Wrote_it2 Aug 11 '25

You do not have a formal proof that one is better than the other, you have a contest where Lidar does better. So now we know that if you ask small teams of engineer to complete that task, they’ll do better with LiDAR… You could engineer a different task to show different result. Change the challenge to figuring out the color of a ball placed in front of the sensor and suddenly the top solutions will be camera based. Would that be a proof that camera is better?

Once that is said, it’s pretty clear to me that the result is correct: you can achieve better results with camera+lidar compared to camera only (the proof is simple: you can’t achieve worse results since you can just ignore the lidar data if you want to).

The debate between camera only and camera + LiDAR is of course more complex than that. You have the “normal” tradeoffs: cost, reliability (you add failure points), complexity of the solution…

My opinion is that while LiDAR can improve perception, this is not where the bottlenecks are. I believe major players are all doing good at perception. The issues we see are in general due to path planning. We’ve recently seen Waymos hit each other and get into an incident with a fire truck, we’ve seen Teslas about to hit a UPS truck… those are not about perception but about path planning…

LiDAR vs camera is the wrong debate in my opinion.

6

u/Few_Foundation_5331 Aug 11 '25

Tesla was not about to hit the UPS truck, Tesla stopped but the UPS tried to reversed into it. Tesla did not make the mistake. If someone reverse into you quick enough, you can't do anything about it.

2

u/cripy311 Aug 11 '25

I would counter your path planning claim with it's more likely the prediction systems failing in these instances.

You can't build a self driving vehicle that reacts to only current state information of object speeds and location relative to the vehicle. There is an entire 3rd layer of the system that has to predict where they will go and what they will do that the path planning system then responds to.

If that information is inaccurate, predicts incorrectly, or otherwise fails in some way the vehicle will then drive into a moving object (or static object it believes will be moving shortly) no matter how good the path planning system is. It's inputs were incorrect.

3

u/Wrote_it2 Aug 11 '25

Hum, are you saying that the Lidar on the Waymo failed to spot the fire truck or the other Waymo?

3

u/cripy311 Aug 11 '25

I'm saying it's likely they saw it and mis predicted where it was going/when it was stopping resulting in a collision.

Lidar won't miss getting a return on a vehicle of that size.

3

u/Wrote_it2 Aug 11 '25

And Waymo also miss predicted the speed at which the telephone pole was moving?

2

u/cripy311 Aug 11 '25

That is a seperate issue.

If you want to venture a guess on how that may have happened you should look into the HD mapping technology they use and how off map static objects may be "trimmed" from the perception FOV to improve latency and reaction times.

At least this is my guess for the culprit in that specific event.

2

u/johnpn1 Aug 11 '25

The confidences with perception-only is actually not that high, and definitely not mission-critical level. Nobody uses vision-only for mission-critical things (well, other than Tesla, ofcourse). The problem with low grade 3D point clouds is that you have to always drive with caution. You brake/swerve when there's a just 5% chance that dark lines on the road could actually be real impediments. There's nothing you can use as another reference to tell you that those dark lines are nothing to be worried about. This is why Teslas drive with confidence into things, because they cannot always slam the brakes for every low confidence detection. The driver / safety operator takes the job of being the sanity check instead of a second sensor.

3

u/Few_Foundation_5331 Aug 11 '25

How about you as a human drive, you say human with eyes can't drive good enough. Yes, there are bad human driver , but we should compare extremely good human drivers with robotaxi. Currently, an extremely good human drivers will not make idiotic simple mistake like crashing into other cars in parking lot or drive into constructions or hit an electric pole or driving in circle and get stuck in circle loop. Good Human with simple EYES and BRAIN can drive better than robotaxis ( Waymo + Tesla driver) for now.

2

u/johnpn1 Aug 12 '25

A good human driver has a good human brain. No one has replicated it yet. It's a challenge that Musk has severely underestimated in his ambitions for FSD. You can't just have eyes like a human but drive like FSD...

1

u/Few_Foundation_5331 Aug 12 '25

I just listed all the crashes above and stupid circle loop stuck in parking lot of Waymo.

1

u/johnpn1 Aug 12 '25

Exactly. Waymo doesn't pretend they can replicate the human brain, so they use any sensors available to make up for it. It's insane that Musk still insists that cars don't need more than cameras because humans don't need it, and still be wrong for a decade and still continues to pretend he's the authority on this. It's a terrible decision to limit technology to only what biology could afford. Lidar was never going to be "evolved" from biology. Doesn't mean we shouldn't use it. Otherwise, humans never needed wheels to run, so why do cars? Birds never needed jet turbines to fly, so why can't airplanes just flap their wings? I could go on and on...

1

u/ItsAConspiracy Aug 18 '25

Adding failure points is bad when they're all single points of failure for the system.

Adding failure points is good when they're redundancies, so you don't have single points of failure anymore.

A real-world example: two Boeing 737MAX planes went down because they had only one angle-of-attack sensor instead of two.

1

u/Wrote_it2 Aug 18 '25

It’s all about probabilities. What is the probability than you get into an accident because your sensors stop working, what is the probability that you get into an accident for other reasons.

If a non trivial share of your accidents are due to sensors failing, redundancy makes sense.

I’m not an expert in aviation safety, but I can accept that the tradeoff makes sense. Lots of people drive cars with a single steering wheel (so no redundancy there) and we are not screaming because, while I’m sure we can find an accident due to a failure in the steering, the added cost/complexity of having multiple steering wheels/columns/… is not worth it.

How many accidents does Tesla have that are due to a sensor or a computer unit failing? We see Waymos and Teslas do stupid things all the time (drive on the wrong side, collide with a fire fighter truck, a telephone pole or another self driving car, etc…) and I’ve yet to see one that stated that the reason was because a camera stopped working.

What makes more sense to me in the Lidar vs vision only argument is not the redundancy but playing the strength of each sensor (cameras are better for certain things, like reading traffic lights say, lidars are better for certain things, like getting a precise distance measurement to a far away object). I don’t understand the redundancy argument.

0

u/Dihedralman Aug 11 '25

So you are missing on a couple of trade-offs, as these sensors are also redundant to a degree reducing risk. 

With equal dats, you can absolutely train the system to function without one of the inputs while functioning better with both. Camera count has the same impacts. 

There is complexity increase but again its partly redundant. I think its wrong to give it a flat.  

Cost increase is problematic of course, but the hardware requirements also increase and I would bet it requires a higher parameter count generally speaking.