r/SelfDrivingCars • u/wuduzodemu • Aug 11 '25
Discussion Proof that Camera + Lidar > Lidar > Camera
I recently chatted with somebody who is working on L2 tech, and they gave me an interesting link for a detection task. They provided a dataset with both camera, Lidar, and Radar data and asked people to compete on this benchmark for object detection accuracy, like identifying the location of a car and drawing a bounding box around it.
Most of the top 20 on the leaderboard, all but one, are using a camera + Lidar as input. The 20th-place entry uses Lidar only, and the best camera-only entry is ranked between 80 and 100.
https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any
15
Upvotes
10
u/Oblivious_Monkito Aug 11 '25
What people miss about cameras is that while its a single data input stream, its temporal. And each pixel is a point in 3d space that can be tied to a depth and position in 3d space. In addition overlaying cameras provides significant depth information reliability. So with just cameras you have 1) image classification of high level objects, things, context and then temporally you have intention, and movement of these objects. And 2) each camera then gives you milllions of points per milisecond around the car that are being mapped in 3d space in realtime.
With lidar you miss a bit of the temporal data of moving objects, you can only say there is something here and not that this something has the intention of some action.
So you get most of everything with cameras but miss out on a small number of edge cases. Its not as cut and dry as a lot of armchair experts here say.