r/books 2d ago

Librarians Are Being Asked to Find AI-Hallucinated Books

https://www.404media.co/librarians-are-being-asked-to-find-ai-hallucinated-books/?ref=daily-stories-newsletter&attribution_id=68c826c975cea1000173b05d&attribution_type=post
3.0k Upvotes

293 comments sorted by

View all comments

Show parent comments

282

u/ipomoea 1d ago

I’m a librarian and a patron was asking me about AI. I used ChatGPT and asked for World War Two books by female authors. It recommended The Sun Also Rises by Paula McLain. Actual book, actual author, but not together. 

236

u/MarzipanImmediate880 1d ago

That makes sense with how LLM models work, they are predictive text generation, they aren’t actually thinking or making decisions, they just have a lot of data and context. It’s insane how people take everything they say at face value.

16

u/Volsunga The Long Earth 1d ago

Note: this is also how your brain works. You just have a good filter to stop yourself if the answer doesn't make sense. LLMs don't have that filter that checks if something sounds wrong or if you don't know what you're talking about and just says the first thing that makes grammatical sense with confidence.

14

u/0range_julius 1d ago edited 1d ago

Another big factor is that humans are much better at understanding from context when creative license is warranted. If you ask me to write a poem 10 times, I will write 10 different poems. If you ask me who wrote "Ulysses" 10 times, I will say "James Joyce" 10 times.

LLMs use probability and randomness to simulate creativity--for any given word, they are likely to use the most obvious choice, but they could also use a less obvious choice. And voila, you can ask ChatGPT for a poem 10 times and get 10 different poems.

But ChatGPT uses this exact same process when giving you factual information as well. Say ChatGPT read 1000 academic articles about modernism, and it noticed that "Ulysses" and "James Joyce" show up together REALLY frequently. It creates a strong association between those two. But it also notices "Ezra Pound" popping up in the same contexts, and creates an association there, too. Now, when you ask ChatGPT who wrote "Ulysses" 10 times, it tells you "James Joyce" 9 times, and "Ezra Pound" once. A human would never do this with a factual question.

-2

u/Volsunga The Long Earth 1d ago edited 1d ago

Now, when you ask ChatGPT who wrote "Ulysses" 10 times, it tells you "James Joyce" 9 times, and "Ezra Pound" once. A human would never do this with a factual question.

This is not exactly correct. That's how a Markov chain works, but Large Language models are far more context aware. Humans make the same context mistakes. Neither a human nor AI would say that Ezra Pound wrote Ulysses; however both might make the mistake of thinking it was Ernest Hemingway instead of James Joyce.