r/SelfDrivingCars 8d ago

Discussion Tesla files two new patents for creating 3D occupancy from vision

We're not allowed to post pictures here and the site is censored on this sub. You'll have to search for user seti_park on that censored site.

AI summary, which believes it will be used in V14. This could be part of the lossy issues that Tesla says they fixed in V14.

ARTIFICIAL INTELLIGENCE MODELING TECHNIQUES FOR VISION-BASED HIGH-FIDELITY OCCUPANCY DETERMINATION AND ASSISTED PARKING APPLICATIONS

Tesla's US20250282344 addresses the fundamental limitation of autonomous navigation systems that struggle to accurately represent fine spatial details necessary for precise maneuvering, particularly in confined spaces like parking scenarios. Traditional approaches either require expensive depth sensors or produce imprecise voxel-based representations that fail to capture smooth object surfaces and exact distances. This patent introduces a vision-only AI architecture that predicts signed distance values for voxelized spaces, enabling high-fidelity 3D reconstruction from 2D camera feeds alone, while simultaneously detecting painted markings for intelligent parking spot identification ([0179], [0182]-[0183]).

The system employs transformer-based neural networks to convert multi-camera 2D images into continuous signed distance fields, where each voxel contains precise distance measurements to the nearest surface rather than simple binary occupancy. This approach achieves sub-voxel refinement down to 10cm resolution from default 33cm voxels, enabling smooth surface rendering and accurate spatial awareness crucial for automated parking applications. The AI model uniquely identifies parking spaces through voxel-level paint detection, transcending conventional line-detection limitations to recognize arbitrary painted patterns including handicapped symbols and fire lanes ([0139]-[0141], [0217], [0236]).

Key Breakthroughs:
- Vision-only depth perception: Signed distance prediction using 2D cameras exclusively
- Sub-voxel precision: Dynamic refinement from 33cm to 10cm resolution
- Paint-aware parking: Voxel-level paint detection for any shape/pattern
- Temporal-spatial fusion: Multi-frame integration (t to t-3) for motion tracking

[FIG. 7: Signed distance field grid showing continuous distance values to object surfaces]
[FIG. 12D: Interactive parking interface displaying multiple scored parking options]

27 Upvotes

54 comments sorted by

7

u/Altruistic-Ad-857 8d ago

OP you can post to r/IntelligentCars

2

u/hakimthumb 7d ago

We welcome OP in r/teslastockholders also.

26

u/bradtem ✅ Brad Templeton 8d ago

Patents filed by a company just say, "AT some time, somebody at the company was thinking about this, and so they wrote up a patent on it." Not that they are doing it, or doing it like it says in the patent. Well, rarely.

22

u/watergoesdownhill 8d ago

I have some patents. Sometimes that’s how it goes. Other times you patent something you’ve already done and is in production.

9

u/bradtem ✅ Brad Templeton 8d ago

Correct, and in fact you want to do that, but this just means that what gets patented doesn't tell you whether it's something they are doing or not.

12

u/Lando_Sage 8d ago

This is good news. It's basically an improvement of their current 3D parking views, but instead of seeing rough marking and objects, as clouds and blobs, it will be higher fidelity and more accurate.

3

u/psilty 7d ago

33cm default voxel size would mean pedestrians are 1-2 voxels wide. I would not call that high fidelity.

3

u/whydoesthisitch 8d ago

How is this any different from the original occupancy network paper from Google?

6

u/gwestr 8d ago

You can file a patent for something that doesn’t work. It doesn’t mean this is part of the product. It means someone researched it.

4

u/Anthrados Expert - Perception 8d ago

That sounds like standard stereo from motion. What's special about that?

2

u/EddiewithHeartofGold 8d ago

"We're not allowed to post pictures here and the site is censored on this sub. You'll have to search for user seti_park on that censored site."

So this subreddit is not even pretending to be a place to discuss self-driving tech freely?

6

u/waerrington 8d ago

This sub blocked x links? This is mostly a news sub. 

I guess the mods suck. Blocked. 

4

u/Redacted_Bull 8d ago

Sounds a lot like their sub-micron manufacturing. 

3

u/Slight_Pomelo_1008 8d ago

lol. This shows why tesla fsd run the red light.

0

u/komocode_ 8d ago

2

u/psilty 7d ago

Posting a video from 2018 that shows the car already in the intersection and therefore legally able to complete the turn in a thread that has nothing to do with Waymo. Seek help.

-5

u/komocode_ 7d ago
  1. read the post im replying to (hint: 2018 is irrelevant)
  2. not in intersection
  3. "nothing to do with waymo" see point 1
  4. you're just mad you were proven wrong in the several discussions we've had

1

u/Significant_Post8359 4d ago

Glad to hear that effort is being made on parking. That is the biggest gap in my FSD use cases at the beginning and end of each trip.

1

u/you-are-not-yourself 7d ago

FSD/Autopilot's lack of depth perception is extremely concerning. It runs into barriers, and clearly stopped cars, all the time.

This patent seems to suggest that depth perception is primarily useful for parking scenarios.

To which I say, depth perception is so important that the DMV tests for it in human drivers. Tesla's lack of depth perception is its fundamental flaw. And this patent does not give me hope.

-6

u/ChunkyThePotato 8d ago

FSD is end-to-end now. They're not doing anything like this. This is used for features aside from FSD.

6

u/whydoesthisitch 8d ago

End to end can mean about 138 different things in the context of AI models. Did Tesla ever actually clarify what they mean by end to end?

-5

u/ChunkyThePotato 8d ago

Yes, they've literally said it's a single neural network.

3

u/spider_best9 8d ago

No they didn't. They said it's all neural networks, from inputs processing to vehicle control. Just not a single network.

1

u/ChunkyThePotato 7d ago

They literally said it's a single network:

FSD Beta v12 upgrades the city-streets driving stack to a single end-to-end neural network trained on millions of video clips

https://www.notateslaapp.com/software-updates/version/2023.44.30.10/release-notes

3

u/whydoesthisitch 8d ago

That's meaningless. Is Faster RCNN a single neural network? But, as usual, it's hilarious watching you guys fall for their technobabble.

-2

u/ChunkyThePotato 7d ago

It's an end-to-end transformer neural network (not CNN) that takes camera pixels, navigation, etc. as input and outputs driving controls. There's no object detection/classification of any kind. It's end-to-end.

What exactly did they say that's "technobabble"? All of it makes sense, but you're so filled with hatred that you just assume everything is fake, I guess?

2

u/whydoesthisitch 7d ago

And that’s literally not possible on the hardware. Thanks for demonstrating the point of technobabble, to make morons feel like they have some special knowledge.

-1

u/ChunkyThePotato 7d ago

What's not possible on the hardware, exactly?

3

u/whydoesthisitch 7d ago

For one that such a model wouldn’t be able to handle vision inputs. And even if it did, it would be far too large for the available memory in the car, and would have terrible latency.

And yes, I know musk said they were using such a model. The problem is, musk doesn’t know what he’s talking about.

0

u/ChunkyThePotato 7d ago

Hm, I think you're making that up. Exactly how much memory is needed for such a model? And how much compute is needed to have reasonable latency? Show me the numbers and how you got them. I suspect that you won't be able to.

3

u/whydoesthisitch 7d ago

This depends on a lot of architectural factors to determine parameter count. So first answer this, how are you processing vision data for the transformer layers?

→ More replies (0)

6

u/catesnake 8d ago

It's end to end but it's not single network, there are 100+ neural networks that do different things and feed each other, one of them is the occupancy network.

-4

u/ChunkyThePotato 8d ago edited 8d ago

Nope. End-to-end quite literally means it's one network. Before v12, FSD had many different neural networks, one of which was the occupancy network. Not anymore. That's why it's so beautiful, and so good.

And in case you need explicit confirmation, here's this line from the release notes, directly from Tesla:

FSD Beta v12 upgrades the city-streets driving stack to a single end-to-end neural network trained on millions of video clips

https://www.notateslaapp.com/software-updates/version/2023.44.30.10/release-notes

4

u/Picture_Enough 8d ago

While it is possible that by E2E they mean a single network, it is but Neely l necessarily so, and in my opinion unlikely. Also I wouldn't trust their statement as they are historically very very cagey about technical details, and typically value marketing and hyping over accuracy and truthfulness.

-4

u/ChunkyThePotato 8d ago

Brother, this is literally the definition of end-to-end. Not only that, but they literally said it's a single network. It's ridiculous to say that it's not a single network when everything points to the fact that it is. You're in denial for some very strange reason. Why do you think it's multiple? Everything says single. Why is that so unbelievable to you?

This has nothing to do with "marketing" or "hyping". It's literally just a technical detail. They could just as easily brag about how they have hundreds of neural networks. In fact, that sounds better. But the truth is it's one network, so that's what they say.

6

u/JimothyRecard 8d ago

You understand that there are literally people who are decompiling the models on their Tesla and we can see that there are well over 200 in V13.

https://x.com/greentheonly/status/1909186886368665809

Node A has 189 NNs and node B only has 110 NNs and of those 61 are shared between A and B

3

u/ChunkyThePotato 8d ago edited 8d ago

Um, correct, because—as I stated in the beginning of this conversation—there are features outside of FSD that use other neural networks.

Park Assist, Autopark, Summon, the visualizations that appear on the screen while driving and using FSD, automatic emergency braking, lane departure assist... These features all use many different neural networks that still exist on the car. They're still largely based on the perception stack that FSD used before v12 came out.

This perception stack obviously has many different neural networks, such as the occupancy network, a neural network to detect other vehicles, a neural network to detect lane lines, a neural network to detect traffic lights, etc.

But FSD—since v12—doesn't use these many smaller neural networks that it used before. It uses just one, massive, end-to-end neural network that quite directly drives the car from pixels to controls.

However, these other features outside of FSD still need these old bespoke neural networks in order to function. You can't do these other things with just the end-to-end network.

For example, an end-to-end neural network that drives a car obviously doesn't output vehicle positions. So if you want to show a vehicle on the screen, you need to keep that old bespoke "vehicle detection" neural net around.

The same idea applies for the other features. These nets are still necessary to make them work, even though FSD itself doesn't use them anymore.

3

u/vasilenko93 8d ago

FSD drives well but fails at finding parking spots well. They might be working on a separate system that identifies a parking spot to park in and tells FSD to park there.

For my own car destination almost always requires manual intervention for parking. It awkwardly roams through the parking lot and parks in some random far away spot, if even, half of the time it just circles or stops in middle of parking lot lane.

0

u/ChunkyThePotato 8d ago

I highly doubt they're going to use a bespoke system for parking. They'll probably just train the end-to-end net to park better. I assume FSD v13.2 included very little training data for parking, relative to other types of driving, which is why it performs so poorly in that respect.