&w=3840&q=100)
Can AI guide your health questions? OpenAI's HealthBench puts it to test
But what if you had an artificial intelligence (AI) tool trained to think like a doctor that can actually explaine what's likely, what's not, and what questions to ask at your next check-up?
This is what HealthBench, an open-source benchmark from OpenAI, aims to bring to you. OpenAI is testing how well AI models, like ChatGPT, handle real-world medical scenarios. HealthBench is designed to evaluate if AI can offer reliable, safe, and helpful responses to the kinds of questions people actually ask when they're worried about their health.
How does HealthBench work and who built it?
Think of HealthBench as a health-focused performance test for AI. It's not an app or a tool that you can download, yet. Instead, it's a benchmarking system. That means it's a way to measure how smart (and safe) AI models really are when it comes to real-world medical questions about things like diagnosis, treatment options, or even understanding symptoms.
Announcing the launch on X, OpenAI posted, 'HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository.'
Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. https://t.co/s7tUTUu5d3
— OpenAI (@OpenAI) May 12, 2025
'The large dataset, called HealthBench, goes beyond exam-style queries and tests how well artificial intelligence models perform in realistic health scenarios, based on what physician experts say matters most,' the company said in a blog post on Monday.
The company stated that the evaluation framework was developed in collaboration with 262 physicians in 26 specialties who have practiced across 60 countries (Full paper available here).
'Improving human health will be one of the defining impacts of Artificial General Intelligence (AGI). If developed and deployed effectively, large language models have the potential to expand access to health information, support clinicians in delivering high-quality care, and help people advocate for their health and that of their communities,' the company wrote in the post.
Karan Singhal, who leads OpenAI's health AI team, said in a post on LinkedIn, 'Unlike previous narrow benchmarks, HealthBench enables meaningful open-ended evaluation through 48,562 unique physician-written rubric criteria spanning several health contexts (e.g., emergencies, global health) and behavioral dimensions (e.g., accuracy, instruction following, communication). We built HealthBench over the last year, working with 262 physicians across 26 specialties with practice experience in 60 countries.'
What kind of medical problems is HealthBench designed to test?
HealthBench gives AI models tough medical cases that real doctors handle in clinics and hospitals every day. These are not simple textbook questions. They're messy, nuanced, and often incomplete, just like real life.
The models are scored on how well they understand symptoms, consider different possibilities, suggest correct diagnoses, recommend treatments, and even explain their reasoning.
In short, OpenAI is testing whether AI can think like a doctor, not just repeat medical facts.
What can HealthBench mean for healthcare users and patients?
From confusing lab reports to conflicting opinions on Google, patients often feel lost. HealthBench aims to ensure that AI models, like the ones behind ChatGPT, can safely assist both patients and doctors. If done right, this could lead to tools that:
Help patients understand medical info in plain English
Support doctors with second opinions or risk assessments
Improve diagnosis in remote or resource-poor areas
Streamline documentation and decision-making in hospitals
How will AI tools like this benefit patients directly?
Right now, HealthBench is more of a behind-the-scenes development, but the impact is already visible. For example, newer versions of ChatGPT (like GPT-4-turbo) are getting better at handling medical questions, thanks to testing frameworks like HealthBench.
In the near future, we could see:
Chatbots that help explain your MRI results
AI companions that help you track chronic illnesses
Tools to prepare better questions for your doctor's visit
Think of it as AI-powered health literacy for everyone.
How can HealthBench help doctors in clinical practice?
Doctors could eventually use AI tools trained and tested with HealthBench to:
Get a second opinion or diagnostic support
Save time on clinical documentation
Help explain conditions to patients more clearly
Stay updated with the latest treatment guidelines
HealthBench is also a reminder that AI isn't perfect. It needs to be monitored, cross-checked, and used with caution, just like any other tool in medical science.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Mint
2 hours ago
- Mint
Apple CEO Tells Staff AI Is ‘Ours to Grab' in Hourlong Pep Talk
(Bloomberg) -- Apple Inc. Chief Executive Officer Tim Cook, holding a rare all-hands meeting following earnings results, rallied employees around the company's artificial intelligence prospects and an 'amazing' pipeline of products. The executive gathered staff at Apple's on-campus auditorium Friday in Cupertino, California, telling them that the AI revolution is 'as big or bigger' as the internet, smartphones, cloud computing and apps. 'Apple must do this. Apple will do this. This is sort of ours to grab,' Cook told employees, according to people aware of the meeting. 'We will make the investment to do it.' The iPhone maker has been late to AI, debuting Apple Intelligence months after OpenAI, Alphabet Inc.'s Google, Microsoft Corp. and others flooded the market with products like ChatGPT. And when Apple finally released its AI tools, they fell flat. But Cook struck an optimistic tone, noting that Apple is typically late to promising new technologies. 'We've rarely been first,' the executive told staffers. 'There was a PC before the Mac; there was a smartphone before the iPhone; there were many tablets before the iPad; there was an MP3 player before iPod.' But Apple invented the 'modern' versions of those product categories, he said. 'This is how I feel about AI.' An Apple spokesperson declined to comment on the gathering. The hourlong meeting addressed a range of topics, including the retirement of operating chief Jeff Williams, increasing Apple TV viewership and advances in health care with features like the AirPods Pro hearing-aid technology. It also touched on donations and community service by Apple employees, the company's goal to become carbon neutral by 2030, and the impact of regulations. 'The reality is that Big Tech is under a lot of scrutiny around the world,' Cook said. 'We need to continue to push on the intention of the regulation and get them to offer that up, instead of these things that destroy the user experience and user privacy and security.' Cook often holds town hall-style chats when visiting Apple's offices around the world, but companywide meetings from the Steve Jobs Theater at headquarters are unusual. The remarks followed a blockbuster earnings report, with sales growing nearly 10% during the June quarter. That beat Wall Street expectations and eased concerns about iPhone demand and a slowdown in China. Apple still faces myriad challenges, including Trump administration tariffs and a regulatory crackdown on its business practices. The company said Thursday that tariffs would bring a $1.1 billion headwind this quarter, though Apple was upbeat about sales growth. It also said that App Store revenue rose by a percentage in the double digits last quarter, despite efforts in the EU and elsewhere to further restrict that business. Echoing comments he made during the earnings conference call, Cook told employees the company is investing in AI in a 'big way.' He said 12,000 workers were hired in the last year, with 40% of the new hires joining in research and development roles. Apple's chip development efforts, led by executive Johny Srouji, are key to the company's AI strategy, Cook said. Apple is working on a more powerful cloud-computing chip — code-named Baltra — to power artificial intelligence features, Bloomberg News has reported. It's also setting up a new AI server manufacturing facility in Houston. The meeting included Craig Federighi, senior vice president of software engineering, who discussed the future of Apple's Siri voice assistant. The company had planned to roll out a Siri overhaul as part of Apple Intelligence earlier this year, adding the ability to tap into user data to better fulfill requests. It was delayed, spurring management changes for the company's AI work. Federighi explained that the problem was caused by trying to roll out a version of Siri that merged two different systems: one for handling current commands — like setting timers — and another based on large language models, the software behind generative AI. 'We initially wanted to do a hybrid architecture, but we realized that approach wasn't going to get us to Apple quality,' Federighi said. Now, Apple is working on a version of Siri that moves to an entirely new architecture for all of its capabilities. That iteration is slated for as early as spring, Bloomberg News has reported, though Apple executives haven't confirmed a timeline other than a release next year. 'The work we've done on this end-to-end revamp of Siri has given us the results we needed,' the engineering executive told employees. 'This has put us in a position to not just deliver what we announced, but to deliver a much bigger upgrade than we envisioned. There is no project people are taking more seriously.' Federighi cited leadership changes, including putting Vision Pro creator Mike Rockwell and his headset software leadership team in charge of Siri, as a driving force in improving the product. He said Rockwell and his group have 'supercharged' the company's work in the area. In his speech, Cook also pushed employees to move more quickly to weave AI into their work and future products. 'All of us are using AI in a significant way already, and we must use it as a company as well,' Cook said. 'To not do so would be to be left behind, and we can't do that.' Employees should push to deploy AI tools faster, and urge their managers and service and support teams to do the same, he said. Cook also addressed the company's retail strategy, stressing that the current plan is to focus on opening new stores in emerging markets and upping the investment in Apple's online store. The iPhone maker is opening outlets in India, the United Arab Emirates and China this year, and is preparing to add its first location in Saudi Arabia next year. 'We need to be in more countries, and you'll see us go into more emerging markets in particular,' Cook said. That doesn't mean Apple will ignore other places, he said, but a 'disproportionate amount of growth' will be in new areas. The CEO also shared his enthusiasm about upcoming products, though he didn't get specific. 'I have never felt so much excitement and so much energy before as right now,' he said. Bloomberg News has previously reported that Apple plans to launch its first foldable iPhone next year and is also working on a stream of smart home devices. New headset products, smart glasses, a push into robotics and a redesigned iPhone for the two-decade anniversary are also underway. 'The product pipeline, which I can't talk about: It's amazing, guys. It's amazing,' Cook said. 'Some of it you'll see soon, some of it will come later, but there's a lot to see.' More stories like this are available on


Indian Express
6 hours ago
- Indian Express
ChatGPT is second most-used tool for learning new skills for students at IIT Bombay
ChatGPT, the popular generative Artificial Intelligence (AI) chatbot, has emerged as the second most-used tool for learning new skills among students at the Indian Institute of Technology (IIT) Bombay—after online platforms like Coursera. This was revealed as Insight – the institute's official student media body, released its Senior Survey 2025 report on Friday. The findings are based on responses from 282 students. Of the 272 students who answered the question, 'How did you generally study/learn new skills in the institute?', 118 cited online platforms, while 65 mentioned ChatGPT. Only 9 students said they used library books—the traditional method of learning new things in university settings. Despite AI tools becoming prevalent in university settings, ChatGPT remains underutilized in certain areas. Out of 138 respondents to the question on – in which situations have you not used ChatGPT ever – 89 said that they have not used it for resume-making. Whereas only 46 said that they have not used it for assignments and projects. As respondents of the Senior Survey are soon to enter the workforce, it is important to note that 'work-life balance' and 'a career aligned with one's skill set' have emerged as top priorities—ranking higher than financial compensation. Factors like location of posting and work culture were deemed less important. Among 269 respondents for a question on important factors to consider when choosing career – 29.4 percent ranked work-life balance highest. Separately, over 40 percent of 262 respondents said a career aligned with their skill set was most important. Contrary to the perception that IIT Bombay students often move away from core engineering, the survey shows that out of the 282 students who responded to the question on their immediate plans after graduation – 67 said they would continue in core engineering, while 66 planned to stay in technology. When asked about interest in their core branch, 135 of 277 respondents said they were and still are inclined to pursue it. However, 58 said they had lost interest, while 65 admitted they were never inclined in their field of study. Whereas 19 students said that they were not interested in their core branch before but are now keen to pursue.


NDTV
6 hours ago
- NDTV
'Robots Replacing Students?': China's First AI Robot Joins PhD Programme
Human-level artificial intelligence (AI), popularly referred to as Artificial General Intelligence (AGI) may or may not be near, but China's humanoid robots are not leaving anything to chance. Shanghai Theatre Academy (STA) has accepted an AI robot named Xueba 01 into its four-year PhD programme in Drama and Film, making it the first time a humanoid machine has been granted full doctoral-candidate status. The robot named Xueba 01 has been developed by the University of Shanghai for Science and Technology in partnership with DroidUp Robotics, according to a report in South China Morning Post. STA accepted the robot's application last Sunday (Jul 27) during the World Artificial Intelligence Conference. Xueba 01 will be pursuing a four-year doctorate in Drama and Film, focusing on traditional Chinese opera. It has already been given a virtual student ID, and his mentor will be renowned Shanghai artist and professor Yang Qingqing. The humanoid robot, having the face of a handsome adult male, with detailed facial expressions, owing to its silicone skin, is expected to make an appearance on the campus on September 14. After reporting to the college authorities, Xueba 01 will attend classes, rehearse operas with other PhD students, and complete a final dissertation. It will also study artistic subjects like stage performance, scriptwriting, and set design, as well as technical topics such as motion control and language generation. Standing 1.75m tall and weighing about 30kg, Xueba 01 can physically interact with people, with its previous iteration winning third place in the world's first humanoid half-marathon. 'AI can't move people' As the news of an AI robot joining the PhD programme went viral, a section of social media users expressed scepticism at the development. "Now robots are replacing students," wrote one user, while another added: "Art needs life experience. A robot's algorithm-driven creations cannot truly move people."