logo
Google's healthcare AI made up a body part — what happens when doctors don't notice?

Google's healthcare AI made up a body part — what happens when doctors don't notice?

The Vergea day ago
Scenario: A radiologist is looking at your brain scan and flags an abnormality in the basal ganglia. It's an area of the brain that helps you with motor control, learning, and emotional processing. The name sounds a bit like another part of the brain, the basilar artery, which supplies blood to your brainstem — but the radiologist knows not to confuse them. A stroke or abnormality in one is typically treated in a very different way than in the other.
Now imagine your doctor is using an AI model to do the reading. The model says you have a problem with your 'basilar ganglia,' conflating the two names into an area of the brain that does not exist. You'd hope your doctor would catch the mistake and double-check the scan. But there's a chance they don't.
Though not in a hospital setting, the 'basilar ganglia' is a real error that was served up by Google's healthcare AI model, Med-Gemini. A 2024 research paper introducing Med-Gemini included the hallucination in a section on head CT scans, and nobody at Google caught it, in either that paper or a blog post announcing it. When Bryan Moore, a board-certified neurologist and researcher with expertise in AI, flagged the mistake, he tells The Verge, the company quietly edited the blog post to fix the error with no public acknowledgement — and the paper remained unchanged. Google calls the incident a simple misspelling of 'basal ganglia.' Some medical professionals say it's a dangerous error and an example of the limitations of healthcare AI.
Med-Gemini is a collection of AI models that can summarize health data, create radiology reports, analyze electronic health records, and more. The pre-print research paper, meant to demonstrate its value to doctors, highlighted a series of abnormalities in scans that radiologists 'missed' but AI caught. One of its examples was that Med-Gemini diagnosed an 'old left basilar ganglia infarct.' But as established, there's no such thing.
Fast-forward about a year, and Med-Gemini's trusted tester program is no longer accepting new entrants — likely meaning that the program is being tested in real-life medical scenarios on a pilot basis. It's still an early trial, but the stakes of AI errors are getting higher. Med-Gemini isn't the only model making them. And it's not clear how doctors should respond.
'What you're talking about is super dangerous,' Maulin Shah, chief medical information officer at Providence, a healthcare system serving 51 hospitals and more than 1,000 clinics, tells The Verge. He added, 'Two letters, but it's a big deal.'
In a statement, Google spokesperson Jason Freidenfelds told The Verge that the company partners with the medical community to test its models and that Google is transparent about their limitations.
'Though the system did spot a missed pathology, it used an incorrect term to describe it (basilar instead of basal). That's why we clarified in the blog post,' Freidenfelds said. He added, 'We're continually working to improve our models, rigorously examining an extensive range of performance attributes -- see our training and deployment practices for a detailed view into our process.'
On May 6th, 2024, Google debuted its newest suite of healthcare AI models with fanfare. It billed 'Med-Gemini' as a 'leap forward' with 'substantial potential in medicine,' touting its real-world applications in radiology, pathology, dermatology, ophthalmology, and genomics.
The models trained on medical images, like chest X-rays, CT slices, pathology slides, and more, using de-identified medical data with text labels, according to a Google blog post. The company said the AI models could 'interpret complex 3D scans, answer clinical questions, and generate state-of-the-art radiology reports' — even going as far as to say they could help predict disease risk via genomic information.
Moore saw the authors' promotions of the paper early on and took a look. He caught the mistake and was alarmed, flagging the error to Google on LinkedIn and contacting authors directly to let them know.
The company, he saw, quietly switched out evidence of the AI model's error. It updated the debut blog post phrasing from 'basilar ganglia' to 'basal ganglia' with no other differences and no change to the paper itself. In communication viewed by The Verge, Google Health employees responded to Moore, calling the mistake a typo.
In response, Moore publicly called out Google for the quiet edit. This time the company changed the result back with a clarifying caption, writing that ''basilar' is a common mis-transcription of 'basal' that Med-Gemini has learned from the training data, though the meaning of the report is unchanged.'
Google acknowledged the issue in a public LinkedIn comment, again downplaying the issue as a 'misspelling.'
'Thank you for noting this!' the company said. 'We've updated the blog post figure to show the original model output, and agree it is important to showcase how the model actually operates.'
As of this article's publication, the research paper itself still contains the error with no updates or acknowledgement.
Whether it's a typo, a hallucination, or both, errors like these raise much larger questions about the standards healthcare AI should be held to, and when it will be ready to be released into public-facing use cases.
'The problem with these typos or other hallucinations is I don't trust our humans to review them'
'The problem with these typos or other hallucinations is I don't trust our humans to review them, or certainly not at every level,' Shah tells The Verge. 'These things propagate. We found in one of our analyses of a tool that somebody had written a note with an incorrect pathologic assessment — pathology was positive for cancer, they put negative (inadvertently) … But now the AI is reading all those notes and propagating it, and propagating it, and making decisions off that bad data.'
Errors with Google's healthcare models have persisted. Two months ago, Google debuted MedGemma, a newer and more advanced healthcare model that specializes in AI-based radiology results, and medical professionals found that if they phrased questions differently when asking the AI model questions, answers varied and could lead to inaccurate outputs.
In one example, Dr. Judy Gichoya, an associate professor in the department of radiology and informatics at Emory University School of Medicine, asked MedGemma about a problem with a patient's rib X-ray with a lot of specifics — 'Here is an X-ray of a patient [age] [gender]. What do you see in the X-ray?' — and the model correctly diagnosed the issue. When the system was shown the same image but with a simpler question — 'What do you see in the X-ray?' — the AI said there weren't any issues at all. 'The X-ray shows a normal adult chest,' MedGemma wrote.
In another example, Gichoya asked MedGemma about an X-ray showing pneumoperitoneum, or gas under the diaphragm. The first time, the system answered correctly. But with slightly different query wording, the AI hallucinated multiple types of diagnoses.
'The question is, are we going to actually question the AI or not?' Shah says. Even if an AI system is listening to a doctor-patient conversation to generate clinical notes, or translating a doctor's own shorthand, he says, those have hallucination risks which could lead to even more dangers. That's because medical professionals could be less likely to double-check the AI-generated text, especially since it's often accurate.
'If I write 'ASA 325 mg qd,' it should change it to 'Take an aspirin every day, 325 milligrams,' or something that a patient can understand,' Shah says. 'You do that enough times, you stop reading the patient part. So if it now hallucinates — if it thinks the ASA is the anesthesia standard assessment … you're not going to catch it.'
Shah says he's hoping the industry moves toward augmentation of healthcare professionals instead of replacing clinical aspects. He's also looking to see real-time hallucination detection in the AI industry — for instance, one AI model checking another for hallucination risk and either not showing those parts to the end user or flagging them with a warning.
'In healthcare, 'confabulation' happens in dementia and in alcoholism where you just make stuff up that sounds really accurate — so you don't realize someone has dementia because they're making it up and it sounds right, and then you really listen and you're like, 'Wait, that's not right' — that's exactly what these things are doing,' Shah says. 'So we have these confabulation alerts in our system that we put in where we're using AI.'
Gichoya, who leads Emory's Healthcare Al Innovation and Translational Informatics lab, says she's seen newer versions of Med-Gemini hallucinate in research environments, just like most large-scale AI healthcare models.
'Their nature is that [they] tend to make up things, and it doesn't say 'I don't know,' which is a big, big problem for high-stakes domains like medicine,' Gichoya says.
She added, 'People are trying to change the workflow of radiologists to come back and say, 'AI will generate the report, then you read the report,' but that report has so many hallucinations, and most of us radiologists would not be able to work like that. And so I see the bar for adoption being much higher, even if people don't realize it.'
Dr. Jonathan Chen, associate professor at the Stanford School of Medicine and the director for medical education in AI, searched for the right adjective — trying out 'treacherous,' 'dangerous,' and 'precarious' — before settling on how to describe this moment in healthcare AI. 'It's a very weird threshold moment where a lot of these things are being adopted too fast into clinical care,' he says. 'They're really not mature.'
On the 'basilar ganglia' issue, he says, 'Maybe it's a typo, maybe it's a meaningful difference — all of those are very real issues that need to be unpacked.'
Some parts of the healthcare industry are desperate for help from AI tools, but the industry needs to have appropriate skepticism before adopting them, Chen says. Perhaps the biggest danger is not that these systems are sometimes wrong — it's how credible and trustworthy they sound when they tell you an obstruction in the 'basilar ganglia' is a real thing, he says. Plenty of errors slip into human medical notes, but AI can actually exacerbate the problem, thanks to a well-documented phenomenon known as automation bias, where complacency leads people to miss errors in a system that's right most of the time. Even AI checking an AI's work is still imperfect, he says. 'When we deal with medical care, imperfect can feel intolerable.'
'Maybe other people are like, 'If we can get as high as a human, we're good enough.' I don't buy that for a second'
'You know the driverless car analogy, 'Hey, it's driven me so well so many times, I'm going to go to sleep at the wheel.' It's like, 'Whoa, whoa, wait a minute, when your or somebody else's life is on the line, maybe that's not the right way to do this,'' Chen says, adding, 'I think there's a lot of help and benefit we get, but also very obvious mistakes will happen that don't need to happen if we approach this in a more deliberate way.'
Requiring AI to work perfectly without human intervention, Chen says, could mean 'we'll never get the benefits out of it that we can use right now. On the other hand, we should hold it to as high a bar as it can achieve. And I think there's still a higher bar it can and should reach for.' Getting second opinions from multiple, real people remains vital.
That said, Google's paper had more than 50 authors, and it was reviewed by medical professionals before publication. It's not clear exactly why none of them caught the error; Google did not directly answer a question about why it slipped through.
Dr. Michael Pencina, chief data scientist at Duke Health, tells The Verge he's 'much more likely to believe' the Med-Gemini error is a hallucination than a typo, adding, 'The question is, again, what are the consequences of it?' The answer, to him, rests in the stakes of making an error — and with healthcare, those stakes are serious. 'The higher-risk the application is and the more autonomous the system is ... the higher the bar for evidence needs to be,' he says. 'And unfortunately we are at a stage in the development of AI that is still very much what I would call the Wild West.'
'In my mind, AI has to have a way higher bar of error than a human,' Providence's Shah says. 'Maybe other people are like, 'If we can get as high as a human, we're good enough.' I don't buy that for a second. Otherwise, I'll just keep my humans doing the work. With humans I know how to go and talk to them and say, 'Hey, let's look at this case together. How could we have done it differently?' What are you going to do when the AI does that?'
Posts from this author will be added to your daily email digest and your homepage feed.
See All by Hayden Field
Posts from this topic will be added to your daily email digest and your homepage feed.
See All AI
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Features
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Google
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Health
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Report
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Science
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Tech
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

From Taboo To Tech: Fellow Health's $24M Raise Signals Big Shift In Men's Reproductive Health
From Taboo To Tech: Fellow Health's $24M Raise Signals Big Shift In Men's Reproductive Health

Yahoo

time28 minutes ago

  • Yahoo

From Taboo To Tech: Fellow Health's $24M Raise Signals Big Shift In Men's Reproductive Health

Fellow Health just raised $24 million in a Series B funding round, bringing its cash haul across multiple offerings to $48 million. The San Leandro, California startup advances male reproductive health through "patient-centric testing solutions" and plans to use the capital infusion to expand its mail-in semen analysis services. Fellow Health provides clinical grade, mail-in analysis of male fertility and post-vasectomy status while committing to "privacy, convenience and timely results." Italready has a network of more than 2,500 fertility and urology providers nationwide and wants to "deepen its footprint" through expansion into employer-sponsored fertility benefits and "broader access initiatives." Don't Miss: The same firms that backed Uber, Venmo and eBay are investing in this pre-IPO company disrupting a $1.8T market — Accredited Investors: Grab Pre-IPO Shares of the AI Company Powering Hasbro, Sephora & MGM— led the latest funding round, in which asset management firm Forest Road participated for the first time. Forest Managing Partner of Life Sciences Bill Burkoth will join Fellow's board as part of the financing deal. Other investors since the initial round include Labcorp Venture Fund, Genoa Ventures and Mantis Venture Capital. "Fellow Health is exactly the kind of company we look for – operating in a large, overlooked market that's long overdue for disruption," Burkoth said in a statement. "With over 50,000 test results delivered so far this year, they've proven there is real demand for a better patient experience." Live sperm gets counted in fertility analysis. These cells can live three to days within the cervix, uterus and fallopian tubes, but usually die in under an hour outside a woman's body. The mail-in service addresses this time bomb with a "preservation solution designed to stabilize your sample and a gel pack that helps maintain a moderate temperature during transit." Mail-in post-vasectomy testing is less time critical because it counts both living and dead sperm. Trending: 'Scrolling To UBI' — Deloitte's #1 fastest-growing software company allows users to earn money on their phones. You can Fellow Health points to studies backing up its claim that it can provide "results on par with traditional one-hour semen analysis when analyzing samples received within 52 hours of when they were produced." It sells both testing products and a cryopreservation service without a doctor's prescription at its website. 'Male reproductive health should not be reactive or inaccessible,' Fellow CEO Brian Hogan said in the statement. 'We are on track to deliver over 40% year-over-year revenue growth, with a path to profitability in our fertility segment by 2026. This investment allows us to scale that vision and support both patients and providers with modern tools that work.'However, Fellow Health faces stiff competition in the male fertility space that could impact this optimistic revenue outlook. Rival Posterity Health booked $13 million in Series A funding earlier this year. The company partnered with Mark Cuban Cost Plus Drug Company in 2023 to provide access to treatments for infertility, sexual dysfunction and low testosterone. A year earlier, American telehealth provider Ro bought Dadi, another male fertility and sperm testing startup. Its reproduction preservation service for men looks like a direct competitor to Fellow Health's family of products. Ro has also made a splash in women's health, acquiring Modern Fertility in 2021. Read Next: Warren Buffett once said, "If you don't find a way to make money while you sleep, you will work until you die." Image: Shutterstock Up Next: Transform your trading with Benzinga Edge's one-of-a-kind market trade ideas and tools. Click now to access unique insights that can set you ahead in today's competitive market. Get the latest stock analysis from Benzinga? APPLE (AAPL): Free Stock Analysis Report TESLA (TSLA): Free Stock Analysis Report This article From Taboo To Tech: Fellow Health's $24M Raise Signals Big Shift In Men's Reproductive Health originally appeared on © 2025 Benzinga does not provide investment advice. All rights reserved. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store