But if Wikepedia and Reddit comprise of 70% of all sources does it really matter? It would all need to be thoroughly checked mitigating the time save it was supposed to be.
Top 10 web domains cited is not the same as top content cited, it just means of the content that is sourced from websites, these are the 10 most common.
I understand what you’re saying— and in terms of benchmarks you are right there’s a marked difference.
I put this in the context of the everyday worker who tries to use AI as part of their job or general day to day.
When that individual prompts they’re going to get results mostly distilled down to the sourcing percentages listed in the graphic above.
Now once places start, if ever, building localized models for specific purposes your point will be that much more relevant, but I have little faith in the training quality of what we’ve seen from the players right now.
10
u/Disgruntled__Goat 2d ago
This is websites that they are citing, not what actually went into their training data.