4 days ago
AI-Generated Medical Podcasts Deceive Even the Experts
For the first time, researchers have evaluated the use of artificial intelligence (AI) to generate podcasts from peer-reviewed scientific articles. Using Google's NotebookLM application, the team created podcast scripts based on studies published in the European Journal of Cardiovascular Nursing ( EJCN ). The results were eye-opening: Half of the authors did not realize the podcast hosts were not human.
The study assessed whether AI could simulate a realistic scientific dialogue between two speakers discussing published research. Findings were presented at this year's Annual Congress of the Association of Cardiovascular Nursing and Allied Professions and simultaneously published in EJCN .
Too Polished to Be Human?
The AI-generated podcasts averaged 10 minutes. Without knowing the content was machine-produced, most authors said their research was summarized clearly, in simple language, and with structured delivery. Some even remarked that the 'hosts' sounded like they had clinical or nursing backgrounds.
But not all feedback was glowing. Several participants felt the delivery was unnaturally smooth — lacking hesitation, repetition, or organic back-and-forth — prompting suspicion of AI involvement. Others flagged mispronounced medical terms and factual errors. One podcast, for example, focused on heart failure diagnosis instead of management. Another spoke exclusively about women, even though the study included men.
Some authors were also distracted by the overly enthusiastic, American-style tone of the narration, with superlatives used to describe modest results. A more academic tone, they suggested, would be more appropriate — particularly if the tool is used for scientific audiences.
Promise for Science Communication
Led by Philip Moons, PhD, from the KU Leuven Department of Public Health and Primary Care, Leuven, Belgium, the researchers created 10 podcasts based on EJCN articles. Despite imperfections, they concluded that 'AI-generated podcasts are able to summarize key findings in an easily understandable and engaging manner.'
'Podcasts were found to be most appropriate for patients and the public but could be useful for researchers and healthcare professionals as well if they were tailored accordingly,' the authors wrote.
'It was striking how accurate the podcasts were in general. Knowing that we are just at the beginning of this kind of AI-generated podcasts, the quality will become better over time, probably within the next few months,' Moons said in a press release. He believes the tool could help researchers more effectively disseminate their work.
Moons got the idea after testing NotebookLM with one of his own papers, shortly after Google launched the feature in September 2024. 'When I did a first test case with one of my own articles, I was flabbergasted by the high quality and how natural it sounded.'
After generating the podcasts — ranging from 5 to 17 minutes — researchers were asked to evaluate the content through a questionnaire and a 30-minute video interview.
Missing Context but Strong Engagement
All participating authors agreed that the podcasts effectively conveyed the key findings of their research in simple, accessible language. Many also found the conversational format between two 'hosts' made the content more engaging.
Several praised the hosts' professionalism. 'I was curious about their background — it really seemed like they had medical or nursing training,' one author said. However, some were unsettled by the lack of introductory context. The podcasts provided no information about the identity of the speakers or how the audio was produced, leaving listeners uncertain about the source.
Overall, most found the content reliable, though a few pointed out factual errors. One author noted that obesity was described as a 'habit,' potentially misleading listeners by implying it is merely a lifestyle choice.
Despite these issues, half of the authors — one of whom was an AI expert — did not realize the podcasts were machine-generated. Many said they were 'shocked' or 'amazed' by the quality. Most of the participants were regular podcast listeners. Even those who suspected AI involvement were surprised by how natural and fluent the results sounded.
Expanding Research Reach
All authors agreed that future versions should clearly disclose AI involvement. Most also recommended adopting a more academic tone if the target audience includes researchers, along with a greater focus on study methods and limitations.
Although patients and the general public were identified as the primary audience, the researchers noted that AI-generated podcasts could serve as a cost-effective, scalable way for healthcare professionals to stay current with new research. They also suggested the format could help broaden the visibility and reach of scientific publications.
'This could be a sustainable model to get the message out to people who do not typically read scientific journals,' Moons said. Still, he emphasized the need for human oversight 'to add nuance.' He envisions a hybrid model in which AI-generated content is supplemented with human input.
That vision may already be taking shape. The beta version of Google's NotebookLM (currently available only in English) now allows real-time interaction with the AI. After launching a podcast, users can ask questions directly to one of the 'hosts.' The AI generates a spoken response, and the podcast then continues — seamlessly integrating human-machine dialogue.