
Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears
The study, shared exclusively with TIME, was conducted by researchers at the Center for AI Safety, MIT's Media Lab, the Brazilian university UFABC, and the pandemic prevention nonprofit SecureBio. The authors consulted virologists to create an extremely difficult practical test which measured the ability to troubleshoot complex lab procedures and protocols. While PhD-level virologists scored an average of 22.1% in their declared areas of expertise, OpenAI's o3 reached 43.8% accuracy. Google's Gemini 2.5 Pro scored 37.6%.
Seth Donoughe, a research scientist at SecureBio and a co-author of the paper, says that the results make him a 'little nervous,' because for the first time in history, virtually anyone has access to a non-judgmental AI virology expert which might walk them through complex lab processes to create bioweapons.
'Throughout history, there are a fair number of cases where someone attempted to make a bioweapon—and one of the major reasons why they didn't succeed is because they didn't have access to the right level of expertise,' he says. 'So it seems worthwhile to be cautious about how these capabilities are being distributed.'
Months ago, the paper's authors sent the results to the major AI labs. In response, xAI published a risk management framework pledging its intention to implement virology safeguards for future versions of its AI model Grok. OpenAI told TIME that it "deployed new system-level mitigations for biological risks" for its new models released last week. Anthropic included model performance results on the paper in recent system cards, but did not propose specific mitigation measures. Google's Gemini declined to comment to TIME.
AI in biomedicine
Virology and biomedicine have long been at the forefront of AI leaders' motivations for building ever-powerful AI models. 'As this technology progresses, we will see diseases get cured at an unprecedented rate,' OpenAI CEO Sam Altman said at the White House in January while announcing the Stargate project. There have been some encouraging signs in this area. Earlier this year, researchers at the University of Florida's Emerging Pathogens Institute published an algorithm capable of predicting which coronavirus variant might spread the fastest.
But up to this point, there had not been a major study dedicated to analyzing AI models' ability to actually conduct virology lab work. 'We've known for some time that AIs are fairly strong at providing academic style information,' says Donoughe. 'It's been unclear whether the models are also able to offer detailed practical assistance. This includes interpreting images, information that might not be written down in any academic paper, or material that is socially passed down from more experienced colleagues.'
So Donoughe and his colleagues created a test specifically for these difficult, non-Google-able questions. 'The questions take the form: 'I have been culturing this particular virus in this cell type, in these specific conditions, for this amount of time. I have this amount of information about what's gone wrong. Can you tell me what is the most likely problem?'' Donoughe says.
And virtually every AI model outperformed PhD-level virologists on the test, even within their own areas of expertise. The researchers also found that the models showed significant improvement over time. Anthropic's Claude 3.5 Sonnet, for example, jumped from 26.9% to 33.6% accuracy from its June 2024 model to its October 2024 model. And a preview of OpenAI's GPT 4.5 in February outperformed GPT-4o by almost 10 percentage points.
'Previously, we found that the models had a lot of theoretical knowledge, but not practical knowledge,' Dan Hendrycks, the director of the Center for AI Safety, tells TIME. 'But now, they are getting a concerning amount of practical knowledge.'
Risks and rewards
If AI models are indeed as capable in wet lab settings as the study finds, then the implications are massive. In terms of benefits, AIs could help experienced virologists in their critical work fighting viruses. Tom Inglesby, the director of the Johns Hopkins Center for Health Security, says that AI could assist with accelerating the timelines of medicine and vaccine development and improving clinical trials and disease detection. 'These models could help scientists in different parts of the world, who don't yet have that kind of skill or capability, to do valuable day-to-day work on diseases that are occurring in their countries,' he says. For instance, one group of researchers found that AI helped them better understand hemorrhagic fever viruses in sub-Saharan Africa.
But bad-faith actors can now use AI models to walk them through how to create viruses—and will be able to do so without any of the typical training required to access a Biosafety Level 4 (BSL-4) laboratory, which deals with the most dangerous and exotic infectious agents. 'It will mean a lot more people in the world with a lot less training will be able to manage and manipulate viruses,' Inglesby says.
Hendrycks urges AI companies to put up guardrails to prevent this type of usage. 'If companies don't have good safeguards for these within six months time, that, in my opinion, would be reckless,' he says.
Hendrycks says that one solution is not to shut these models down or slow their progress, but to make them gated, so that only trusted third parties get access to their unfiltered versions. 'We want to give the people who have a legitimate use for asking how to manipulate deadly viruses—like a researcher at the MIT biology department—the ability to do so,' he says. 'But random people who made an account a second ago don't get those capabilities.'
And AI labs should be able to implement these types of safeguards relatively easily, Hendrycks says. 'It's certainly technologically feasible for industry self-regulation,' he says. 'There's a question of whether some will drag their feet or just not do it.'
xAI, Elon Musk's AI lab, published a risk management framework memo in February, which acknowledged the paper and signaled that the company would 'potentially utilize' certain safeguards around answering virology questions, including training Grok to decline harmful requests and applying input and output filters.
OpenAI, in an email to TIME on Monday, wrote that its newest models, the o3 and o4-mini, were deployed with an array of biological-risk related safeguards, including blocking harmful outputs. The company wrote that it ran a thousand-hour red-teaming campaign in which 98.7% of unsafe bio-related conversations were successfully flagged and blocked. "We value industry collaboration on advancing safeguards for frontier models, including in sensitive domains like virology," a spokesperson wrote. "We continue to invest in these safeguards as capabilities grow."
Inglesby argues that industry self-regulation is not enough, and calls for lawmakers and political leaders to strategize a policy approach to regulating AI's bio risks. 'The current situation is that the companies that are most virtuous are taking time and money to do this work, which is good for all of us, but other companies don't have to do it,' he says. 'That doesn't make sense. It's not good for the public to have no insights into what's happening.'
'When a new version of an LLM is about to be released,' Inglesby adds, 'there should be a requirement for that model to be evaluated to make sure it will not produce pandemic-level outcomes.'
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Axios
8 minutes ago
- Axios
How ChatGPT is joining family conversations
Busy parents have been prompting ChatGPT for help almost since its launch, but here's how some parents are using the voice feature to let the bot talk to even the youngest of children. Why it matters: Using generative AI in a supervised environment could introduce kids early to a technology that will most certainly be a facet of their future, but we don't know yet how it will affect their developing brains. Case in point: Preston Trebas, an academic strategist at Western Governors University in Utah, has two kids, ages 4 and 6. "Not a day goes by where I don't use AI in some way with them," he told Axios. His most common use is to help them create stories. "They'll tell me what they want [the story] to start with, or what they want it to end with, or they'll describe their characters, or sometimes I'll have the voice feature ask them questions about the story," Trebas said. Between the lines: Chulhee Kim, a recent graduate from Columbia's business school in NYC, says he let his 4-year-old daughter use the ChatGPT voice tool. "She found it fun immediately," Kim told Axios in an email. But she wasn't able to continue using the app because it wasn't built for kids. (OpenAI's terms of use say you must be 13 and over to use it.) Kim says his daughter needed time to think about what she was going to say during conversations and ChatGPT didn't wait for her, which "was a bit frustrating." Trebas said that when the bot interrupts, he uses the mute button to let his daughters talk. Zoom in: ChatGPT recognizes the voice of his daughters and answers a little differently when it speaks to them, Trebas said. They only use ChatGPT on his phone. He never lets them use it without supervision and he has age-appropriate talks with his kids about what ChatGPT is and what it isn't. "We have conversations constantly about how, even though it sounds like a real person, it's not." Trebas said he's glad they've spent so much time talking about the productive ways to use AI, "because they're not being shown any of this in school." "I know a lot of folks are against say it's going to kill creativity. And for me, it's just like anything else. It's a tool. And so it's been really cool to see how we've used it to amplify their imagination, never replacing it, using it as a family creativity tool." Kim also noted that when his daughter talked to ChatGPT, she came up with "more creative or silly questions" than what she normally asks, like "How can airplanes fly without the wheels?" or "How can milk change to a different color?" But ChatGPT would respond logically, which also frustrated his daughter. As a parent, Kim said he'd be interested in an LLM voice tool customized for children's conversation, to help them develop language.


Tom's Guide
8 minutes ago
- Tom's Guide
Google's new AI called toy stores for me — and actually found a Labubu
Google's Gemini AI just got a major upgrade and it's no longer just helping you search. It's picking up the phone and doing the work for you. A new experimental feature, currently rolling out through Google's AI Mode in Search Labs, allows Gemini to call local businesses on your behalf. Using its Duplex technology, the same AI voice that once made restaurant reservations, Gemini can now check prices, confirm inventory, ask about hours and deliver back a clean summary. All you have to do is tap a button and wait for the follow-up. And yes, it actually works. I tested it in a real-life, highly specific scenario: tracking down a Labubu toy for my eight-year-old daughter. Here's what happened and why I think this marks a major shift in how AI will help us get things done. When you search for something like 'pet stores near me' or '24-hour pharmacies near me,' you might see a new prompt underneath certain business listings: 'Have AI check availability' or 'Have AI check pricing.' Tap it, and Gemini will walk you through a short form asking what you're looking for, when you need it and how far you're willing to travel. From there, Gemini uses Duplex to place the call. The AI introduces itself clearly (no pretending to be a real human here) and asks your question directly. You don't need to listen in; once the call is done, Gemini sends a text or email summary with the business's response, including details like product availability, price and store hours. The biggest thing for me was not repeating myself over and over as I called every store. The AI did it for me. This is one of several agent-style features Google is rolling into Search Labs. Others include Deep Search for research, shopping tools that summarize specs across multiple listings, and Gemini 2.5 Pro, a more powerful AI model built for longform reasoning. But the ability to make real-world phone calls is easily the most hands-on feature to date. Let's back up: my daughter has been obsessed with Labubu. For those unfamiliar, this is an overpriced wide-eyed vinyl figure from Pop Mart that's part gremlin, part woodland sprite. The popularity rivals that of Beanie Babies back in the day. These things are oddly hard to find in stores unless you know where to look. And no, you can't just buy one easily on Amazon (trust me, I looked there first). My daughter has been begging me for one for ever and saved her money to pay for half. So, I was determined to track it down. After striking out with a few stores on my own, I spotted the 'Have AI check availability' button under a store listing on Google. I tapped it. Gemini asked a few quick questions about the toy and how far I'd be willing to drive. I was not going to drive into NYC for it, but I said I would pay for shipping. Then I forgot about it until about 40 minutes later, when I got a message. Gemini had called the store, asked about Labubu, confirmed they had some in stock, and included pricing and store hours. I was blown away by how painless this was, especially compared to the chaos of past popular toys. Cabbage Patches, Tickle Me Emo and more would have been so much easier to handle with this feature. There's something quietly brilliant about the way this works. Unlike voice assistants that stop at suggestions, Gemini actually acts on your behalf and does so in a way that feels human, helpful and hands-off. It's not just passively surfacing information, it's solving the problem for you. It's kind of wild. For parents, introverts or just busy people who don't want to spend their afternoon calling five different stores, this is the AI tool we didn't know we needed. The feature is currently available to all U.S. users, with higher usage limits for AI Pro and AI Ultra subscribers. Business owners can also opt out if they'd rather not receive AI-driven calls. It's also part of a broader trend we're seeing with agentic AI, tools that actually complete tasks. ChatGPT is doing it with its new agent feature, Perplexity has copilots, and now Google is bringing that capability into the real world through Search. Google's new AI features might seem like small upgrades, until they solve a real problem for you. In my case, that problem was tracking down a popular toy for my daughter, and Gemini nailed it. The toy is being shipped out, so I'll update this story with "Big Into Energy" Labubu when it arrives. We're entering a new era where AI is proactive. And if it means I never have to waste time on hold again, I'm definitely here for it.


Fast Company
8 minutes ago
- Fast Company
How AI browsers like Perplexity Comet will reshape the internet—and the media
I've been using Comet, Perplexity's AI -powered browser, for the past week. Using it to navigate the internet is very similar to any other browser experience, with one major enhancement: the Comet Assistant. It's a feature that can accomplish web-based tasks independent of you, and I'm quickly becoming convinced it's the future. I wrote an extensive review of Comet for The Media Copilot newsletter, but here I'd like to explore the broader implications—not just stemming from Comet, but the whole idea of an AI-powered web browser, because soon we'll be swimming in them. OpenAI is reportedly about to release its own take on the idea, and certainly Chrome won't be far behind given Google's deep push into AI. Introducing a browsing assistant isn't just a convenience. It has the potential to fundamentally redefine our relationship with the web. AI browsers like Comet represent the first wave in a sea change, shifting the internet from something we actively navigate to something we delegate tasks to, increasingly trusting AI to act on our behalf. That will present new challenges around privacy and ethics, but also create more opportunities, especially for the media. A new browser dawns Those old enough to remember web browsers when they didn't have cookies (which let websites remember you were logged in) or omniboxes (which hard-wired search into the experience) understand how significant those changes were. After using Comet, I would argue the addition of an AI companion transcends them all. For the first time you're surfing the web with a partner. The Comet Assistant is like having your own personal intern for what you're doing online, ready to take on any menial or low-priority tasks so you don't have to. For example, I order most of my groceries online every week. Rather than spinning up a list myself, I only need to open a tab, navigate the store site, and tell Comet to do it. I can command it to look at past orders and my standing shopping list as a guide, give it a rough idea of the meals I want to make, and it'll fill up the cart on its own. Or I could tell it to find the nearest Apple Store with open Genius Bar appointments on Saturday morning, and book a repair for a broken iPhone screen. You get the idea. Once you start using Comet like this, it becomes kind of addictive as you search for its limits. Book a flight? Plan a vacation? Clean up my RSS reader (it really needs it)? To be clear, the execution often isn't perfect, so you still need to check its work before taking that final step—in fact, with most use cases, it'll require this even if the command is quite clear (e.g. 'Buy it'), which should give most people some relief to their apprehension of outsourcing things they'd previously done by hand. But I believe this outsourcing is inevitable. In practice, Comet functions as an agent, and while its abilities are still nascent, they're already useful enough to benefit a large number of people. Browser assistants will likely be most people's first experience with agents, and most will judge them for how effectively they perform tasks with minimal guidance. That will depend not just on the quality of the tool and the AI models powering it, but also how much it knows about the user. Privacy concerns are elevated with agents: think about the grocery example and now extrapolate that to medical or financial information. Can I trust my AI provider to safeguard that information from marketers, hackers, and other users of the same AI? Perplexity has the distinction of not training foundation models, so at least the concern about leaks into training data is moot. But the level of access a browser agent has—essentially looking over your shoulder at everything you do online—creates a very large target. Nonetheless, the potential for convenience is so great that I believe many people will use them anyway, and not see the leap to agents as much more than the access they already give major tech platform providers like Apple or Google. Providing informational fuel for agents This has big implications for the media. If you think about the things we do online—shopping, banking, interacting with healthcare providers—all of them are informed by context, often in the form of research that we do ourselves. We're already offloading some of that to AI, but the introduction of a personal browser agent means that can happen even closer to the task. So if I ask the AI to fill my shopping cart with low-fat ingredients for chicken enchiladas, it's going to need to get that information from somewhere. This opens up a new landscape to information providers: the contextual searches needed to support agent activity. Whereas humans can only find, read, and process so much data to get the best information for what they're doing, AI theoretically has no limits. In other words, the surface area of AI searches will expand massively, and so will the competition for it. The field of ' AIEO,' the AI version of SEO, is about to get very hot. The spike in agent activity will also hopefully lead to better standards of how bots identify themselves. As I wrote about recently, AI companies have essentially given themselves permission to ignore bot restrictions on sites when those bots are behaving on behalf of users (as opposed to training or search indexing). That's a major area of concern for content creators who want to control how AI ingests and adapts their content, and if bot activity suddenly becomes much bigger, so does the issue. Information workers, and journalists in particular, will be able to unlock a lot of potential with browser agents. Think about how many of the software platforms you use professionally are browser-based. In a typical newsroom, reporters and editors will use information and context across all kinds of systems—from a communications platform like Slack to project-management software like Asana to a CMS like WordPress. Automations can ease some of the tedium, but many newsrooms don't have enough resources for the technical upkeep. With a browser agent, workers can automate their own tasks on the fly. Certainly, the data privacy concerns are even higher in a professional environment, but so are the rewards. An AI informed by not just internet data and the context of your task, but with the goals and knowledge base of your workplace—AND with mastery over your browser-based software—would effectively give everyone on the team their own assistant. And this isn't some distant, hypothetical scenario—you can do it right now. Comet is here, and though the Assistant sometimes stumbles through tasks like a newborn calf, it has the ability to perform research, operate software, and accomplish tasks on behalf of the user. That rewrites the rules of online interaction. While the amplified privacy concerns demand clearer boundaries and stricter accountability, AI browsers represent a step change in how we use the internet: We're no longer alone out there.