OpenAI has revealed that its latest large language model, GPT-4.5, hallucinates 37 percent of the time according to its in-house factuality benchmarking tool, SimpleQA. This means the model confidently generates inaccurate information over one-third of the time.
This admission raises concerns about the reliability of AI outputs, especially from a company valued at hundreds of billions of dollars. In comparison, OpenAI’s previous models have even higher hallucination rates. The GPT-4o model hallucinates 61.8 percent of the time, while the smaller o3-mini model has a staggering rate of 80.3 percent.
The issue of AI hallucination is not unique to OpenAI. Research indicates that even the best AI models can produce factually correct outputs only about 35 percent of the time. Experts emphasize that users should approach AI-generated content with caution due to these significant inaccuracies.
This situation highlights the challenges facing the AI industry as it strives to develop systems that mimic human intelligence. As OpenAI’s models plateau in performance, the company may need to seek genuine breakthroughs to maintain its leading position in the market.
For more information, visit the original article on Futurism.