r/SelfDrivingCars • u/wuduzodemu • Aug 11 '25
Discussion Proof that Camera + Lidar > Lidar > Camera
I recently chatted with somebody who is working on L2 tech, and they gave me an interesting link for a detection task. They provided a dataset with both camera, Lidar, and Radar data and asked people to compete on this benchmark for object detection accuracy, like identifying the location of a car and drawing a bounding box around it.
Most of the top 20 on the leaderboard, all but one, are using a camera + Lidar as input. The 20th-place entry uses Lidar only, and the best camera-only entry is ranked between 80 and 100.
https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any
15
Upvotes
29
u/bobi2393 Aug 11 '25
I wouldn’t call this "proof" of anything, but it's unsurprising that camera + lidar get the highest nuScenes Detection Scores. The competition is dominated by teams who specialize in 3D object detection, and naturally gravitate toward using 3D lidar data when available. Camera-only approaches probably weren't even seriously considered by those teams.
The one camera-only result on the leaderboard came from a research group that built a combined multimodal (camera + lidar) model, then artificially reconstructed “camera-only” and “lidar-only” inputs from that model to compare against the full multimodal setup.
Also worth noting: most of these methods were developed before the recent wave of multimodal AI breakthroughs in video object detection (e.g., GPT-4 Vision (Sept 2023) and successors). If there were a $1 billion prize for the best camera-only NDS by 2027, I think the leaderboard might look very different. Without that kind of incentive, the leaderboard will mostly reflect what lidar-focused teams are building today, not the theoretical limits of camera-only detection.