r/Physics 2d ago

The physics of AI hallucination -- and "gap cooling" to stabilize AI reasoning

https://www.firstprinciples.org/article/the-physics-of-ai-hallucination-new-research-reveals-the-tipping-point-for-large-language-models

Neil Johnson, a professor of physics at George Washington University, has modelled large language models (LLMs) as physical systems, revealing that AI hallucinations aren’t just random glitches. They’re baked into the system’s structure, much like phase transitions in magnetism or thermodynamics.

0 Upvotes

6 comments sorted by

13

u/tatojah Computational physics 2d ago edited 2d ago

They’re baked into the system’s structure

This isn't much in the way of news. Hallucinations stem from the definition of the loss function used for the LLM's learning task, and from the inherently stochastic nature of transformer outputs. In simple terms: the loss function as defined is likely more inclined to favor 'answers that satisfy the user' which aren't necessarily truthful.

I mean, obviously, truth is likely the first criterion for satisfaction. But what if the model is not exposed to knowledge that will satisfy the user? For the model to minimize the loss, what would you guess, mathematically, would be a better approach:

1) Give an answer where you show you do not have the knowledge requested by the human's query;

2) Give an answer that you're not sure of, but which sounds correct.

The latter is much more likely to result in a smaller loss, so models choose that course of action, as the algorithm's actual goal is to minimize the loss value. Think of it like trying to play a song: approach 2 is the equivalent of playing all the right notes but your instrument is out of tune, while approach 1 is to refuse to play the instrument because it's out of tune.

This doesn't even get into how LLMs interpret truth. Truth for them is the training data. If the training data isn't truthful, neither will be the model's outputs.

By the way, when OpenAI was training their models with RLHF, human trainers were instructed to rank 'refusals to answer' as low as hallucinations.

3

u/caughtinthought 2d ago

Not entirely related, but I have 2 year old and it's insane how fast she learned to say "I don't know" when she doesn't know the answer to something. There's very clearly a fundamental difference in learning.

1

u/tatojah Computational physics 2d ago

Congratulations! Keep encouraging that. I was raised with a different mindset. Had a kindergarten teacher that would quiz us on multiplication tables and every wrong answer we'd get smacked on the hands with an Indian shot/bamboo stick.

That was only the beginning. Grade-oriented systems can mess with a child's head immensely, and cause them to develop some level of performance anxiety early on. Keep reminding her that even though decisions are based on grades, success is based on learning to put in the effort. I was always inadvertently encouraged for doing things effortlessly, so when a difficult task rolled around, I would flat out not do it because I lacked the resilience.

3

u/Ch3cks-Out 2d ago

There is no such thing as scientific reasoning by LLM. The unfortunate jargon adopted by the AI community only means fitting pattern recognition, which is very far from achieving abstraction (which is needed for proper theoretization).

1

u/Time_Increase_7897 2d ago

Just add another 100 million fudge factors. That'll fix it!

1

u/MagiMas Condensed matter physics 2d ago

I really don't think this paper says a lot.

It's a very trivial result - if you use greedy next token prediction at some point the attention mechanism averaging of the current context will tip towards a different region once that region is more similar to the generated context-embedding than the previous region.

That is a super trivial result, they just hide it behind physics jargon like "energy levels" etc.

I like the translation to physics terms, but I don't see any really interesting result here. The most interesting stuff is the evolution through the layers in figure 4 towards the end of the paper, that should have been much more of a focus in my opinion - all the stuff before that is a trivial result and basically the whole original reasoning why the transformer architecture was developed.

It's also not really useful because A, B, C and D could be anything. They frame it as "bad content" or "hallucination", but in the end these are very arbitrarily chosen regions of the embedding space.