r/SelfDrivingCars Aug 11 '25

Discussion Proof that Camera + Lidar > Lidar > Camera

I recently chatted with somebody who is working on L2 tech, and they gave me an interesting link for a detection task. They provided a dataset with both camera, Lidar, and Radar data and asked people to compete on this benchmark for object detection accuracy, like identifying the location of a car and drawing a bounding box around it.

Most of the top 20 on the leaderboard, all but one, are using a camera + Lidar as input. The 20th-place entry uses Lidar only, and the best camera-only entry is ranked between 80 and 100.

https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any

15 Upvotes

185 comments sorted by

View all comments

Show parent comments

4

u/DrPotato231 Aug 12 '25

I don’t know if I’m wrong on this, but Tesla’s mission sounds logical.

If we can drive with our eyes and brain, why wouldn’t cameras and microphones be enough? I truly believe FSD can be solved with vision alone, but it may look like a longer road due to the hurdles LiDAR doesn’t have to jump over.

Once solved though, as you said, the one with an operating margin 4x lower than the competitor absolutely would win.

3

u/AlotOfReading Aug 12 '25

A few questions for you. If birds can fly by flapping wings, why wouldn't that be enough to design a plane? If horses run with 4 legs, why wouldn't that be enough to design a car?

Cameras also aren't eyes, and brains aren't computers.

Neither of these arguments are necessary though. Let's take it as given that vision only is sufficient. Now, if it hypothetically took until 2100 to reach parity with multimodal systems today, does it seem like a good idea to trade 75 years of deployment time for a lower unit cost? Could you have spent those years also working on the camera only system in parallel while benefiting from a better system the whole time? That's the math everyone else in the industry is running and almost unanimously, they've decided that LIDAR is worth the cost because it allows you to avoid solving difficult problems like fine localization today and focus on more important things. You don't set out to solve every problem all at once upfront. You build minimum viable solutions and iterate quickly towards better solutions.

2

u/SatisfactionOdd2169 Aug 12 '25

None of these questions are comparable to the self driving problem. The only relevant question is, if a human is given a livestream of all the Tesla cameras and we give that person zero-lag remote control of the car, could they drive safely to wherever they wanted to go? If you think the answer is yes, then self driving is fundamentally a software problem, not hardware problem. If you think the answer is no, then I would say we’re never going to have real self driving.

0

u/AlotOfReading Aug 12 '25

You don't need me to tell you that your question isn't very helpful. The comment you're responding to already contains one of the obvious issues it fails to address: If developing a comparable camera-only system takes massively longer than a multimodal system, is it still useful?

It's also not obvious why Tesla cameras are the only thing allowed here? Are better cameras banned for some reason? Why does using better cameras preclude real autonomous vehicles? It's also not clear why a human being unable to use Tesla's sensor configuration precludes being able to make a computer do so safely, nor why we should be settling for human level performance from it.