When Artificial Intelligence Lies with Confidence
Tehran - BORNA - Large language models, including GPT-5 and ChatGPT, which have become key tools for human–machine interaction over the past few years, continue to struggle with a phenomenon known as “hallucination.” Hallucination refers to the generation of plausible but incorrect statements, presenting challenges even for experienced users.
OpenAI researchers recently published a study examining the reasons behind these hallucinations and the role of existing incentives in model evaluation. The study also proposes ways to reduce errors and improve user confidence.
OpenAI defines hallucinations as “plausible but false statements generated by language models.” Despite extensive improvements in model architecture and training data, hallucinations remain a fundamental challenge that cannot be fully eliminated.
For example, researchers asked a chatbot for the title of the Ph.D. dissertation of one of the study’s authors, Adam Tauman Kalai, and received three different answers none of them correct. When asked about his birthday, the model again gave three incorrect responses. These examples demonstrate that AI can confidently produce incorrect information, potentially misleading users.
Causes of Hallucinations
One primary reason for hallucinations is the way large language models are trained. Models are mainly trained to predict the next word in a sentence, without true or false labels attached to the training data. Consequently, models learn language patterns and statistical distributions of words but do not have the ability to distinguish truth from falsehood.
OpenAI explains that simple spelling or punctuation errors diminish as model size increases, but low-frequency facts such as an individual’s birth date or other precise details cannot be reliably predicted from statistical patterns, leading to hallucinations.
The Role of Evaluation and Incentives
OpenAI researchers argue that part of the hallucination problem stems from how models are currently evaluated. Evaluations focus primarily on accuracy, meaning models receive credit for correct answers, while incorrect answers are not directly penalized.
This creates an incentive for models to respond even when they lack sufficient information, to avoid losing points. OpenAI compares this to multiple-choice tests: leaving a question blank guarantees zero points, while guessing may lead to a lucky correct answer.
OpenAI’s Proposed Solutions
OpenAI suggests that evaluation metrics need to be revised to reduce the incentive for guessing and encourage models to express uncertainty.
Key proposals include:
Penalizing high-confidence errors: Responses given with high confidence that are incorrect should incur greater penalties.
Rewarding expressions of uncertainty: Models that accurately indicate they do not know or are uncertain should receive positive credit.
Comprehensive evaluation overhaul: Introducing a few uncertainty-aware tests is not enough; widely used, accuracy-based evaluations must be updated to discourage blind guessing.
These measures not only improve model accuracy but also increase user trust in large language models.
Challenges and Limitations
Despite these efforts, hallucinations remain a significant challenge. Models continually encounter incomplete data or rare facts, making it impossible to eliminate all errors entirely.
Moreover, revising evaluation metrics requires extensive research and development. Companies like OpenAI are continuously improving model architectures, collecting more accurate data, and designing advanced evaluation frameworks, but no approach guarantees the complete eradication of hallucinations.
Importance in Real-World Applications
Hallucinations in large language models can have serious consequences in fields where information accuracy is critical, including medicine, law, scientific consulting, and economic decision-making. Even high-confidence incorrect information can lead to poor decisions and misguide users.
Therefore, revising evaluation metrics and creating appropriate incentives to reduce hallucinations is not just a technical issue but an ethical and practical necessity for AI developers.
OpenAI’s study concludes that hallucinations result from a combination of training methods and evaluation incentives:
Training based on next-word prediction without true/false labels leads to the production of incorrect information.
Accuracy-based evaluations encourage models to guess even when they lack sufficient information.
Revising evaluation criteria, penalizing high-confidence errors, and rewarding uncertainty expression are proposed measures to reduce hallucinations.
These steps can significantly improve both the accuracy and reliability of large language models, although hallucinations remain an enduring challenge requiring ongoing research.
End article