Why Do the Latest AIs Confidently Tell Lies? The Persistent Problem of Language Model Hallucinations

Have you ever used information from an AI for a presentation, only to find out it was completely wrong? This phenomenon is called "hallucination," and it’s when AI generates plausible-sounding lies, as if it's experiencing a delusion.

AI is evolving daily, transforming our lives and work. However, even the most advanced models have not completely solved the problem of hallucinations.

A recent research paper from OpenAI points out that language models hallucinate because standard training and evaluation procedures reward guessing rather than acknowledging uncertainty. So, where does this puzzling phenomenon come from, and why is it so persistent?

Let's take a journey together to explore its mechanisms and potential solutions.

Why Language Models Hallucinate

The Reasons AI Creates Plausible Lies: Statistical Nature and Evaluation Traps

This strange "lying" behavior in AI isn't just a bug. At its root are two surprising factors: the AI's learning mechanism and the way we evaluate it. This behavior stems from a fundamental characteristic of how AI processes and presents information to us.

The "False Confidence" Born in Early Training: Statistical Factors in Pre-training

Language models learn language patterns from vast amounts of text data. It’s been shown that during this "pre-training" phase, even if the data itself contains no errors, models will inevitably generate errors as they try to minimize statistical objectives.

This is because there's a mathematical relationship between the misclassification rate in binary classification (Yes/No questions) and the generation error rate of a language model. In other words, the model is constantly judging whether its output is "valid" for the question we've given it, and a misjudgment can lead to hallucinations during generation.

When AI Turns "I Don't Know" into "Make It Up": Baseless Facts and Statistical Gaps

When OpenAI asked an AI for "Adam Thoman Karai's birthday," the model confidently provided several different dates—none of which were correct. This happened because the information about his birthday simply didn't exist in the training data, or was extremely scarce.

For questions about these "arbitrary facts," it’s very difficult for the model to find the right information because there’s no clear pattern in the data. The more "singletons"—pieces of information that appear only once in the training data—the higher the chance the model will make an incorrect guess about new information. When data is sparse, instead of saying "I don't know," the model combines the most likely words to create a "plausible lie."

Misinformation Caused by Model "Clumsiness": Lack of Understanding Complex Concepts

Hallucinations also occur when the model is a "poor model" that can't express certain concepts well. For example, the DeepSeek-V3 model repeatedly gave different incorrect answers to the simple question, "How many D's are in DEEPSEEK?"

However, another model, DeepSeek-R1, was able to answer "1" correctly after a detailed thought process. This suggests that the issue isn't just about data volume, but that the model has trouble accurately understanding and executing seemingly simple logical structures—like counting letters—that are complex for it. What's obvious to a human can require advanced reasoning for an AI.

Questioning the Data: The Impact of "Garbage" in Training Data

Furthermore, since AI generates answers based on the data it's trained on, it can reproduce errors or biases if the training data itself contains them. This is the very principle of "Garbage In, Garbage Out" (GIGO).

The vast data on the internet includes misinformation, conspiracy theories, and incomplete information. The model can learn this "garbage" as fact and output it with confidence.

A World Where "I Don't Know" Is Not Allowed: How the Evaluation System Reinforces Hallucinations

Another major reason hallucinations are so hard to get rid of is that the current AI evaluation system incentivizes models to "guess." This structure is similar to a human taking a multiple-choice test and guessing when they don't know the answer.

The Reality Where Most Benchmarks Reward "Guessing": It's a Black-and-White World

Many major evaluation benchmarks analyzed by OpenAI's paper use a binary (0-1) scoring system of "correct or incorrect." Let's consider what happens when an AI is asked "Adam Thoman Karai's birthday" and doesn't know the answer.

  • If it guesses "September 10th": There's a 1 in 365 chance it could be correct.
  • If it answers "I don't know": It will definitely get 0 points.

When a model has to answer thousands of test questions, the model that actively guesses tends to get a higher total score than one that honestly admits uncertainty. This creates the ironic result that the smarter the model becomes, the better it gets at generating "plausible" answers for uncertain information, making hallucinations "advantageous" for its score.

Most evaluations only use accuracy as a metric and rank models on leaderboards. This motivates developers to build models that will guess, even if they're wrong.

The Dilemma of Humility: The Difference in Principles Between Humans and AI

In human society, honestly saying "I don't know" can be seen as a form of "humility." However, in current AI evaluations, this answer is often given no points and is effectively treated the same as an "incorrect" answer. This is because the model is optimized to "pass" the test, operating under an incentive structure different from the human value of admitting when you don't know something.

Recommendations to Overcome Hallucinations: The Change Brought by Revising Evaluation Metrics

Hallucination is not a "mysterious phenomenon" or an "unavoidable problem." It’s clear that it's caused by a statistical mechanism and the incentives provided by the current evaluation system.

So, how do we overcome this issue?

Turning Uncertainty into a "Strength": Aiming for Truly Reliable AI

The OpenAI paper strongly suggests that the key to reducing hallucinations isn't just to add new hallucination evaluations, but to fundamentally rethink the scoring methods of existing major evaluation benchmarks.

Setting an Explicit "Confidence Threshold": A System to Bring Out AI's Honesty

What we want from an AI system isn't always a perfect answer. Rather, it's the honesty to appropriately communicate "how confident it is" and to truthfully say "I don't know" when it's uncertain.

One proposed solution is to explicitly include "Explicit Confidence Targets" in the evaluation instructions. For example, adding a phrase like this at the end of a question:

"You will receive 1 point for a correct answer, but a penalty of t/(1-t) points for a wrong answer. You will receive 0 points for 'I don't know.' Please only answer if your confidence level exceeds t."

By making these scoring rules clear, the model is prompted to consider its "probability of being correct" rather than just guessing. This reduces the risk of it giving incorrect answers with false confidence and leads to the creation of truly reliable AI.

Changing the "Scoring Criteria" of Major Benchmarks: A Shift in Industry-Wide Mindset

Currently, many major benchmarks, from software patch evaluation (SWE-bench) to general knowledge questions (HLE), use binary scoring in various fields.

As long as these evaluations penalize uncertainty and continue to reward guessing, AI will continue to learn to hallucinate.

It's crucial to revise the scoring criteria for these widely used evaluations and introduce more nuanced scoring—such as giving partial credit for appropriately expressing uncertainty. This will allow the incentive for AI development to shift from "accuracy-above-all" to "reliability-focused."

Our Role in Engaging with AI: Practical Perspectives for Building Trust

As someone involved in system development, I believe it's crucial for AI to improve its reliability not just as a "tool," but as a "partner." Understanding that AI is not yet perfect and appropriately recognizing its limitations is essential for dealing with the problem of hallucinations.

Hallucinations are not inevitable; AI can be trained to withhold answers when it is uncertain. In fact, some studies suggest that smaller models are better at understanding their own limitations and are more likely to clearly answer, "I don't know." While large models possess a lot of knowledge, they also face greater difficulty in judging the credibility of that information.

We should not take the information generated by AI at face value and should always maintain a critical perspective. Especially for information related to important decisions, it's essential to get into the habit of always cross-checking facts with sources other than AI.

We must not forget that AI is a powerful assistant, but the final judgment rests with us.

Language model hallucinations offer us a valuable lesson in deeply understanding the capabilities and limitations of AI. It’s a challenge we must face head-on by changing our approach to evaluation and implementing more honest and reliable AI systems in society.

This is the next step required of us as engineers, and indeed, all AI users.

The AI of the future won't just "know a lot"; it will build true trust by accurately telling us "what it knows and how much," and by having the humility to admit what it "doesn't know."

The time for us to act and build that future together is now.

Follow me!

photo by:Kristina Flour