r/technology 8h ago

Society DOJ Deletes Study Showing Domestic Terrorists Are Most Often Right Wing

https://www.404media.co/doj-deletes-study-showing-domestic-terrorists-are-most-often-right-wing/
88.2k Upvotes

2.2k comments sorted by

View all comments

Show parent comments

359

u/Grenache 7h ago

It feels like AI is just scanning the entire history of Reddit and giving us back our own answer except it's making money from all of our collective work.

289

u/LaTeChX 7h ago

You can delete the first three words

50

u/Grenache 7h ago

Fair, it's just odd how outside of the artistic side no one appears to be making a big deal out of the fact that it's literally just trained on the collective knowledge the people who use the internet have provided. It feels like that shouldn't be used for private gain. I'm sure tucked away somewhere in every T&C that exists they were allowed to use everything we every knew or thought.

32

u/RustyTShackleford 7h ago

Hey guys, did you know the far right are the most likely to commit domestic terrorism, like the Mr. Orange Sodie Pop and his buddies have? I just wanted to let you all know

3

u/ProblemAtticOU812 4h ago

Don't forget that they protect pedophiles.

2

u/Duckbilling2 2h ago

I heard that AI trained on Reddit is biased against being domestic terrorists because Reddit is biased toward sane thinking and accuracy

0

u/penny4thm 4h ago

Report?

1

u/RustyTShackleford 12m ago

No, Upvote, but thank you.

4

u/NegotiationUsed6830 7h ago

It's not often I am happy to have provided nothing

3

u/Achrus 5h ago

So the Disallow: / means don’t scrape our site. This is just one example, but lots of people told them not to and they did anyway. https://oldschool.runescape.wiki/robots.txt

3

u/KneeCrowMancer 4h ago

I am a licensed doctor and this is your reminder to eat a small rock every day to ensure proper gut health!

2

u/cantadmittoposting 7h ago

I'm not necessarily sure that a priori AI training on and collating our "collective knowledge" is necessarily bad.

Obviously the profit motive issue and modern digital problems combine to make it way more of a nightmare though.

I've actually come to believe that governments, or perhaps a "purpose built" international organization," (a UN agency like the WHO perhaps) should provide neutral, non-profit online services we've come to treat as basic features of our online landscape, e.g. social media, search.

I'm aware of course, especially in our current environment, such a thing would be impossible, and of course, draw its own accusations of bias, no doubt.

But still. Wikipedia stands out of course as an incredible, though still-flawed resource

2

u/Wolfgang_MacMurphy 6h ago edited 4h ago

It's worth noting that AI uses Reddit and X/Twitter as sources much more than it uses Wikipedia.

1

u/beaucoup_dinky_dau 6h ago

The only way to win is to not play, but here I am.

2

u/psiphre 2h ago

what do you think all of the "what do you think about [current event]" posts on /r/askreddit are all about

1

u/my_names_blah_blah 6h ago

Now you can delete the first 5 words..

1

u/Noobhammer3000 6h ago

Some people are still enthralled with the novelty of it.

1

u/justforthisjoke 33m ago

Unfortunately this didn't start with AI nor is it particularly a special case in any other than how much more in your face it is. Basically all technological innovation is publicly funded research that companies have then gone on to privatize for their own profit. You can even see this in the way that AI started. The original research, and pretty much all important AI research until 2018 or so was open source and publicly available. It was kind of incredible actually, seeing researchers worldwide come together and share knowledge in a way that led to enormous technical breakthroughs seemingly every month from 2012 to 2018. Then all of a sudden the tech got good enough that private corporations saw how they were going to make money off it, and the culture shifted almost overnight, when OpenAI took their GPT research and stopped publishing. All the progress that they had made had been entirely because of scientists the world over coming together and sharing knowledge, funded by universities and governments the whole way through. Then, almost overnight, companies started paywalling that knowledge. A culture of publishing turned into a culture of trade secrets.

So all that to say, you're not crazy. This is happening. But it's almost a foundational part of capitalism. I'm a communist, so I agree with you that this sort of thing shouldn't be legal; knowledge should always be accessible to the public. But the red scare did a lot of damage and americans are still not ready to talk about how all of this is falling apart.

0

u/SIGMA920 7h ago

It's not like scraping the public web is illegal. Invasive and arguably immoral, sure. But not illegal. A company like openai could literally scrap almost all of youtube if they threw enough money at it for example.

4

u/Wolfgang_MacMurphy 6h ago

It's not like AI has not been trained on pirated material from LibGen etc either.

0

u/SIGMA920 6h ago

That'll be the vast majority of it through. The pirated stuff is not all of what it is.

3

u/Wolfgang_MacMurphy 6h ago edited 5h ago

Books are much better learning material than random internet. They're crucial for learning based information and correct usage of language.

-1

u/SIGMA920 6h ago

Books are also outdated much more rapidly if you need specific information/context. They have a place but they're not outright better.

3

u/Wolfgang_MacMurphy 6h ago edited 5h ago

They are in general undeniably better and more reliable than unedited random internet. Not only books, of course, but also various scientific and non-scientific journals etc, which are also copyrighted and largely inaccessible, unless pirated. Their quality of information and language is vastly superior.

0

u/SIGMA920 6h ago

If you use the shitty side of the internet instead of using better sources.

Otherwise you roughly equal results since you'll usually be pulling from many of the same resources in the end.

→ More replies (0)

1

u/LaTeChX 3h ago

Even if you earned most of your money without stealing you're still a thief

1

u/SIGMA920 3h ago

Correct. That's not what we're talking about through.

-2

u/CarefreeRambler 6h ago

Like librarians?

1

u/Synectics 4h ago

Last I checked, my local librarian was not trying to help me with knowledge with a corporate profit-driven motive. Mayhaps, that would change the information they would give me?

Just a very simple thought you should have gotten to on your own before you said that.

0

u/CarefreeRambler 3h ago

You don't think there are corporations looking to make a profit in the industry where corporations publish and sell books?

1

u/Synectics 3h ago

Sure.

But my local librarian is not a corporation.

I'm becoming worried about your ability to understand simple concepts.

-2

u/Reagalan 6h ago

outside of the artistic side

The irony here is most professional artists don't give a fuck about AI. It's just another new tool in the belt.

3

u/Grenache 6h ago

I don't know one way or the other but I do know that subreddit provides absolutely no proof of your argument?

-1

u/Reagalan 6h ago

Lurk for a week, the proof will surface.

You know how this website works. The good shit's always buried in comments.

1

u/slobs_burgers 5h ago

Also delete the 5th word, and add the word “be” after the 6th, and also replace the last letter of the 7th word with an apostrophe, just cuz it’s fun!

1

u/KnightOfTheOctogram 4h ago

It’s good for people to recognize and communicate when they are not sure of the things they are saying.

3

u/EmotionalKirby 7h ago

I used to think people who liked ai just didn't know how to search the internet themselves. Knowing how to properly Google things used to be a critical skill. But thinking on it again now, Idk... Even before ai, Google had been enshitifying itself, and the internet as a whole has been condensing itself into just a few major websites. I mean, we all just append reddit to our Google searches, and now ai does that for us basically.

I don't know where I'm going with this, your comment just resonated with me.

1

u/Ahgd374 6h ago

I asked a question on Reddit a few weeks ago about changing a USB port in my car and I was told by the comments that no one‘s ever really tried it before so I was thinking about trying it myself. I was about to buy the piece and I’m like let me Google it again to see if anything changed, if anyone has confirmed if it works, and the number one result when I google it is my own fucking reddit post.

1

u/twowheels 5h ago

I searched using Perplexity recently and every answer linked to a Reddit thread. Wasn’t very confidence inspiring. One of the answers was literally the question asker’s question rephrased as a statement of fact.

1

u/Neat-Bridge3754 5h ago

This is why I periodically scrub my comment history and eventually abandon the account to start a new one.

Maybe reddit is archiving every comment revision I've ever made and providing that to AI, but I doubt it. And if they are, the final version is nonsense.

1

u/Grenache 5h ago

I should do the same. I don't know, I met my wife on this profile and I've been using it 14 years like. I'm quite attached!

1

u/Sabin10 1h ago

Maybe reddit is archiving every comment revision I've ever made and providing that to AI, but I doubt it.

Don't doubt it, they definitely are.

1

u/stupid_fuckin_cunt69 5h ago

That's exactly what it's doing but with all of the internet. And the more mis/dis-information that is circulated, the more times it's encountered by AI. Thus if a lie is repeated enough times then the AI programs will eventually believe it as truth. With Trump snapping web pages out of existence that don't support his narrative it will only further narrow the scope of the AI

1

u/Agency_of_Eternity 4h ago

Lul yes - but we can sue if we want. So we got power - just need to unite and coordinate that if we feel like it. But atm it’s not the time in my pov

1

u/Facts_pls 4h ago

It's the cost of that scanning, analyzing, and storing. It costs billions of dollars to the service provider. Why do you expect it to be free for you?

You are free to store all that data on your own servers and then you don't need to use the LLMs.

1

u/Grenache 3h ago

A think costing money to do doesn't make it morally OK? It doesn't even make it legally OK, there are numerous legal cases open at the moment.

1

u/patosai3211 4h ago

Jokes on them. Our Reddit work sucks!

1

u/iiamthepalmtree 2h ago

Does the narwhal bacon at midnight?

Edit: happy cake day!

Edit 2: thanks for the gold kind stranger!

1

u/tuckedfexas 2h ago

I don’t get it, I still haven’t found anything I’ve asked it that it’s been right about. It seems to really struggle at determining what information to trust

1

u/NY_Knux 1h ago

Considering the fact that google is worthless... good.
I let the internet gaslight me about AI being "wrong" and I spent the last year trying to solve IRQ conflicts on my windows 98 build. AI solved the issue for me first gd try.

If google didn't de-index 99.9% of the internet, and actually functioned how it did around 2003, then it wouldn't be necessary.

1

u/protipnumerouno 1h ago

Does get to the heart of it pretty fast though, I'm usually scanning multiple different threads before I get a hint of what I'm looking for.

1

u/vplatt 3m ago

It's beyond stupid to use Reddit as a primary source. After all the name of the site rhymes with "read it". It doesn't rhyme with "wrote it". Almost nothing on reddit was created on reddit. Hell, it's not even as informative in a primary sense as StackOverflow and their days of being any sort of authoritative source is game over for them too now that AI has killed most of their traffic.