OpenAI's latest AI models report high ‘hallucination' rate: What does it mean — and why is this significant?

15-05-2025

A technical report released by artificial intelligence (AI) research organisation OpenAI last month found that the company's latest models — o3 and o4-mini — generate more errors than its older models. Computer scientists call the errors made by chatbots 'hallucinations'.
The report revealed that o3 — OpenAI's most powerful system — hallucinated 33% of the time when running its PersonQA benchmark test, which involves answering questions about public figures. The o4-mini hallucinated at 48%.
To make matters worse, OpenAI said it does not even know why these models are hallucinating more than their predecessors.
Here is a look at what AI hallucinations are, why they happen, and why the new report about OpenAI's models is significant.
When the term AI hallucinations began to be used to refer to errors made by chatbots, it had a very narrow definition. It was used to refer to those instances when AI models would give fabricated information as output. For instance, in June 2023, a lawyer in the United States admitted using ChatGPT to help write a court filing as the chatbot had added fake citations to the submission, which pointed to cases that never existed.
Today, hallucination has become a blanket term for various types of mistakes made by chatbots. This includes instances when the output is factually correct but not actually relevant to the question that was asked.
ChatGPT, o3, o4-mini, Gemini, Perplexity, Grok and many more are all examples of what are known as large language models (LLMs). These models essentially take in text inputs and generate synthesised outputs in the form of text.
LLMs are able to do this as they are built using massive amounts of digital text taken from the Internet. Simply put, computer scientists feed these models a lot of text, helping them identify patterns and relationships within that text, and predict text sequences and produce some output in response to a user's input (known as a prompt).
Note that LLMs are always making a guess while giving an output. They do not know for sure what is true and what is not — these models cannot even fact-check their output against, let's say, Wikipedia like humans can.
LLMs 'know what words are and they know which words predict which other words in the context of words. They know what kinds of words cluster together in what order. And that's pretty much it. They don't operate like you and me,' scientist Gary Marcus wrote on his Substack, Marcus on AI.
As a result, when an LLM is trained on, for example, inaccurate text, they give inaccurate outputs, thereby hallucinating.
However, even accurate text cannot stop LLMs from making mistakes. That's because to generate new text (in response to a prompt), these models combine billions of patterns in unexpected ways. So, there is always a possibility that LLMs give fabricated information as output.
And as LLMs are trained on vast amounts of data, experts do not understand why they generate a particular sequence of text at a given moment.
Hallucination has been an issue with AI models from the start, and big AI companies and labs, in the initial years, repeatedly claimed that the problem would be resolved in the near future. It did seem possible, as after they were first launched, models tended to hallucinate less with each update.
However, after the release of the new report about OpenAI's latest models, it has increasingly become clear that hallucination is here to stay. Also, the issue is not limited to just OpenAI. Other reports have shown that Chinese startup DeepSeek's R-1 model has double-digit rises in hallucination rates compared with previous models from the company.
This means that the application of AI models has to be limited, at least for now. They cannot be used, for example, as a research assistant (as models create fake citations in research papers) or a paralegal-bot (because models give imaginary legal cases).
Computer scientists like Arvind Narayanan, who is a professor at Princeton University, think that, to some extent, hallucination is intrinsic to the way LLMs work, and as these models become more capable, people will use them for tougher tasks where the failure rate will be high.
In a 2024 interview, he told Time magazine, 'There is always going to be a boundary between what people want to use them [LLMs] for, and what they can work reliably at… That is as much a sociological problem as it is a technical problem. And I do not think it has a clean technical solution.'

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Talking to ChatGPT? Think twice: Sam Altman says OpenAI has no legal rights to protect ‘sensitive' personal info

Mint

8 minutes ago

Mint

Talking to ChatGPT? Think twice: Sam Altman says OpenAI has no legal rights to protect ‘sensitive' personal info

During an interaction with Podcaster Theo Von, OpenAI CEO Sam Altman spoke about confidentiality related to ChatGPT. According to Altman, many people, especially youngsters, talk to ChatGPT about very personal issues, like a therapist or life coach. They ask for help with relationships and life choices. However, that can be tricky. 'Right now, if you talk to a therapist or a lawyer or a doctor about those problems, there's legal privilege for it. There's doctor-patient confidentiality, there's legal confidentiality,' Altman says. However, right now, no such legal privacy exists for ChatGPT. If there's a court case, OpenAI might have to share 'your most sensitive' chats. Nevertheless, Altman feels this is wrong. He believes conversations with AI should have the same privacy as talks with a therapist. A year ago, no one thought about this. Now, it's a big legal question. 'We should have the same concept of privacy for your conversations with AI that we do with a therapist,' he says. 'No one had to think about that even a year ago,' the OpenAI CEO adds. Von then says he feels unsure about using AI because he worries about who might see his personal information. He thinks things are moving too fast without proper checks. Sam Altman agrees. He believes the privacy issue needs urgent attention. Lawmakers also agree, but it's all very new and laws haven't caught up yet, he said. Von doesn't 'talk to' ChatGPT much himself because there's no legal clarity about privacy. 'I think it makes sense,' Altman replies. ChatGPT as a therapist There are numerous cases reported about people using ChatGPT as their therapist. A recent incident involves Aparna Devyal, a YouTuber from Jammu & Kashmir. The social media Influencer got emotional after missing a flight. It came from years of feeling 'worthless'. She spoke to ChatGPT about being called 'nalayak' at school and struggling with dyslexia. ChatGPT comforted her, saying she kept going despite everything. Aparna felt seen. According to the AI chatbot, Aparna is not a fool, just human. Forgetting things under stress is normal, the AI assistant said. ChatGPT praised her strength in asking for help and said people like her kept the world grounded. 'I'm proud of you,' ChatGPT said.

I tried Gemini on my Galaxy Watch, and it completely changed how I use my phone

Indian Express

an hour ago

Indian Express

I tried Gemini on my Galaxy Watch, and it completely changed how I use my phone

A few weeks ago, Google announced that it is rolling out Gemini to WearOS-powered smartwatches from Pixel, Samsung, Oppo and OnePlus. The much-anticipated update brought Gemini to my wrist, which meant I no longer had to pull out my phone from my pocket to use the AI assistant. I have been playing around with Gemini on my Galaxy Watch 6 Classic for a few days. From simple AI summaries to comparing products online, Google Gemini turned out to be a fun and useful assistant. Here's a look at how the wrist-friendly version of Google's AI assistant became a daily essential for me. Gemini is already available on devices like smartphones, select headphones and earphones, but using the AI chatbot from my wrist felt more natural since it is just one press away. This meant that I could use Gemini while riding my bike, walking, running, and even in the middle of my lunch, without having to worry about the smartphone getting greasy. The compact form factor of the smartwatch and voice-enabled AI allows you to use it anytime, even when munching on your favourite snack. Unlike smartphones with their large displays, the smaller screen promises carefree usage, and I found myself using Gemini more than I would on my flip phone. Powered by the same large language model as the phone and web versions of the AI chatbot, the smartwatch version of Gemini can do a lot of things. Previously, when roaming around the streets of Delhi, I had to stop and use the watch screen to set up navigation or get information about nearby places. But ever since Google brought Gemini to my wrist, I have been able to accomplish these with a single tap on the home button. If you are someone like me and find it hard to use the small smartwatch screen, Gemini can completely change the way you use your wearable on a daily basis. Also, I have the LTE version of the Galaxy Watch 6 Classic, which means I can use Gemini even when I don't have my phone with me. If you happen to own a Wi-Fi-only watch, keep in mind that Gemini will need your smartwatch to be connected to your phone at all times. The WearOS version of Gemini can do a lot of things that previously required you to either open the app or take out your smartphone. For example, you can ask Gemini to summarise your emails from Gmail, which is a lifesaver if you are someone who gets a ton of emails every day. What's more impressive is that Gemini can also help you create, edit, and delete Google Keep notes. Gemini can also remember those tiny details we often miss out. Like, you can tell the AI chatbot to 'Remember that I parked my bike on level 2, pillar number 27' and even get timely reminders by telling it to 'go to the grocery store after work.' And, yes, you can also ask the AI chatbot to set reminders, alarms, make phone calls, send messages and even start timers. These features are so useful for me that I no longer use my phone to do these basic tasks. However, the most impressive thing about the WearOS version of Gemini is that it can also do things you would normally use an AI chatbot for. I asked Gemini to compare the OnePlus 13 with the Galaxy S25, and not only did it quickly answer my query, but also let me ask follow-up questions like 'Where can I buy these phones from?'. 'How much do they cost?' and more. In the last few days, I have noticed that my smartphone usage has drastically reduced, and while much of it can be attributed to my phone's form factor (I use a flip phone), some of it is because I can use Gemini to perform certain actions directly from my wrist. Without a doubt, the WearOS version of Gemini is impressive, but it does have some limitations. The biggest one for me is that Gemini cannot read out notifications from my phone, nor can it make calls to WhatsApp contacts. Another area where Gemini falls short of expectations is when I asked to open apps on my smartwatch. Sometimes, it quickly launches the app I asked it to open, but most of the time, it struggles to get the name right and says that I have no such app. While these are snakk annoyances, I hope Google fixes them and improves Gemini's WearOS integration. Anurag Chawake is a Senior Sub-Editor at His fascination with technology and computers goes back to the days of Windows 98. Since then, he has been tinkering with various operating systems, mobile phones, and other things. Anurag usually writes on a wide range of topics including Android, gaming, and PC hardware among other things related to consumer tech. His Twitter, Instagram, Facebook and LinkedIn user name is antechx. ... Read More

Sam Altman concerned by young people's over-reliance on ChatGPT

Time of India

2 hours ago

Time of India

Sam Altman concerned by young people's over-reliance on ChatGPT

Synopsis Speaking at a Federal Reserve forum, the OpenAI CEO pointed out the benefits of AI tools like ChatGPT. But he also cautioned that society needs to set new guardrails against the risks of overdependence on AI tools.