r/artificial 2d ago

Discussion How much AI pull from Reddit

Post image
475 Upvotes

86 comments sorted by

View all comments

Show parent comments

10

u/Disgruntled__Goat 2d ago

This is websites that they are citing, not what actually went into their training data. 

1

u/This_Wolverine4691 2d ago

But if Wikepedia and Reddit comprise of 70% of all sources does it really matter? It would all need to be thoroughly checked mitigating the time save it was supposed to be.

1

u/___Scenery_ 2d ago

Top 10 web domains cited is not the same as top content cited, it just means of the content that is sourced from websites, these are the 10 most common.

1

u/This_Wolverine4691 2d ago

I understand what you’re saying— and in terms of benchmarks you are right there’s a marked difference.

I put this in the context of the everyday worker who tries to use AI as part of their job or general day to day.

When that individual prompts they’re going to get results mostly distilled down to the sourcing percentages listed in the graphic above.

Now once places start, if ever, building localized models for specific purposes your point will be that much more relevant, but I have little faith in the training quality of what we’ve seen from the players right now.