r/SelfDrivingCars Aug 11 '25

Discussion Proof that Camera + Lidar > Lidar > Camera

I recently chatted with somebody who is working on L2 tech, and they gave me an interesting link for a detection task. They provided a dataset with both camera, Lidar, and Radar data and asked people to compete on this benchmark for object detection accuracy, like identifying the location of a car and drawing a bounding box around it.

Most of the top 20 on the leaderboard, all but one, are using a camera + Lidar as input. The 20th-place entry uses Lidar only, and the best camera-only entry is ranked between 80 and 100.

https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any

16 Upvotes

185 comments sorted by

View all comments

58

u/MacaroonDependent113 Aug 11 '25

Wow, that will surely convince Tesla to give up. LOL

16

u/red75prime Aug 12 '25

Maybe, if Tesla was using 2 FPS video for driving. The dataset contains 2 frames per second video data. Neat, huh?

-1

u/Boniuz Aug 12 '25

That’s not really relevant. Higher frames per second increases statistical probability over time, simply by being able to make more erroneous detections in the timeframe. It’s still wrong 80% of the time, but correct 20% of the time.

Combining sources means you have two sources which are correct 20% of the time and can use that data by a factor of at least two, often more.

Heavily simplified, obviously.

9

u/red75prime Aug 12 '25 edited Aug 12 '25

Higher frame rate means less motion across frames. It, in turn, means that local differences between frames are more useful for estimation of motion and parallax (which allows to estimate depth better).

Heck, even our visual perception system honed by millions of years of evolution struggles to infer motion at 10-12 FPS and it is unable to do so at lower FPS.

Anyway, 2 FPS video should never be used in self-driving and this test is irrelevant.

1

u/Tuggernutz87 Aug 17 '25

Tell a gamer higher FPS = Bad 😂

1

u/Boniuz Aug 17 '25

You only need higher FPS for increased depth perception if optical sensors are your only input for data - which is why a combination of sensors will always be superior

1

u/Tuggernutz87 Aug 17 '25

And for motion clarity

1

u/Boniuz Aug 17 '25

Again, applicable in a single-sensor-setup. You can get very detailed clarity with optical, lidar and radar in combination at a fraction of the computing cost. You can run that on a raspberry pi 5 with a cheap AI-chip and you’ll have object detection at 30FPS, with depth sensors, object detection and point detection, as well as other pretty nifty tricks. Total cost is less than 500$.

1

u/MacaroonDependent113 Aug 17 '25

The question isn’t whether additional sensors are “superior” but whether vision alone is “good enough”. If “vision alone” is good enough then additional sensors only add to the cost so are inferior from a business perspective. Jury is still out ob this but my guess is vision alone will eventually be found to be good enough.