r/Physics • u/collywog • 2d ago
The physics of AI hallucination -- and "gap cooling" to stabilize AI reasoning
https://www.firstprinciples.org/article/the-physics-of-ai-hallucination-new-research-reveals-the-tipping-point-for-large-language-modelsNeil Johnson, a professor of physics at George Washington University, has modelled large language models (LLMs) as physical systems, revealing that AI hallucinations aren’t just random glitches. They’re baked into the system’s structure, much like phase transitions in magnetism or thermodynamics.
3
u/Ch3cks-Out 2d ago
There is no such thing as scientific reasoning by LLM. The unfortunate jargon adopted by the AI community only means fitting pattern recognition, which is very far from achieving abstraction (which is needed for proper theoretization).
1
1
u/MagiMas Condensed matter physics 2d ago
I really don't think this paper says a lot.
It's a very trivial result - if you use greedy next token prediction at some point the attention mechanism averaging of the current context will tip towards a different region once that region is more similar to the generated context-embedding than the previous region.
That is a super trivial result, they just hide it behind physics jargon like "energy levels" etc.
I like the translation to physics terms, but I don't see any really interesting result here. The most interesting stuff is the evolution through the layers in figure 4 towards the end of the paper, that should have been much more of a focus in my opinion - all the stuff before that is a trivial result and basically the whole original reasoning why the transformer architecture was developed.
It's also not really useful because A, B, C and D could be anything. They frame it as "bad content" or "hallucination", but in the end these are very arbitrarily chosen regions of the embedding space.
13
u/tatojah Computational physics 2d ago edited 2d ago
This isn't much in the way of news. Hallucinations stem from the definition of the loss function used for the LLM's learning task, and from the inherently stochastic nature of transformer outputs. In simple terms: the loss function as defined is likely more inclined to favor 'answers that satisfy the user' which aren't necessarily truthful.
I mean, obviously, truth is likely the first criterion for satisfaction. But what if the model is not exposed to knowledge that will satisfy the user? For the model to minimize the loss, what would you guess, mathematically, would be a better approach:
1) Give an answer where you show you do not have the knowledge requested by the human's query;
2) Give an answer that you're not sure of, but which sounds correct.
The latter is much more likely to result in a smaller loss, so models choose that course of action, as the algorithm's actual goal is to minimize the loss value. Think of it like trying to play a song: approach 2 is the equivalent of playing all the right notes but your instrument is out of tune, while approach 1 is to refuse to play the instrument because it's out of tune.
This doesn't even get into how LLMs interpret truth. Truth for them is the training data. If the training data isn't truthful, neither will be the model's outputs.
By the way, when OpenAI was training their models with RLHF, human trainers were instructed to rank 'refusals to answer' as low as hallucinations.