logo
ChatGPT Beats Claude, Google's Gemini, DeepSeek In Test Of AI Agents

ChatGPT Beats Claude, Google's Gemini, DeepSeek In Test Of AI Agents

Forbes13-05-2025
Rating AI agents including ChatGPT's o3, Claude from Anthropic, and Google's Gemini on web search ... More tasks
ChatGPT's recent o3 AI model beat Anthropic's Claude, Google's Gemini, and Hangzhou's Deepseek in a test of AI agents for web research. But there's still a considerable gap between human capabilities and the best AI agents.
Reseach firm FutureSearch put 11 major large language models through some messy, real-world research tasks, 89 in total, and evaluated each model on its ability to find original sources, seek out data, gather evidence, compile data, and validate claims.
The highest performance achieved was .51 on a scale where an estimated 'perfect' agent would hit about .8. Which means that even the best AI agents available now are relatively easily outperformed by humans.
'We can conclude that frontier agents … substantially underperform smart generalist researchers who are given ample time,' the study says.
Here's how they scored the various AI models:
Still, AI agents are rapidly improving. Based on the year-old ChatGPT -4-Turbo's score of 0.27, researchers say that 'about 45% of the gap between smart generalist researchers and frontier agents' was closed within a year of development.
Also, free or cheap agents such as DeepSeek are not that far behind paid and top-end AI agents from OpenAI. OpenAI's o3 leads the pack, with Claude and Gemini close behind, and for now closed models are clearly superior for research-heavy tasks, but free and open-source models are increasingly capable.
All LLM-based AI agents still have major issues, however. They fall short of smart human researchers — especially on strategic planning, thoroughness, evaluating sources for quality, and 'memory management:' they tend to forget earlier findings mid-task. A particular problem is that AI agents often engage in 'satisficing," or accepting a lower level of quality instead of optimizing until they find the highest-quality level of response.
That's a core reason why ChatGPT's o3 model came in first. ChatGPT-o3 tended to validate its answers more thoroughly and stop short of better available answers less frequently.
Since a year has served to close almost half the gap between elite humans and the best AI agents, it may not be long until AI agents are outperforming even the best humans.
However, given ChatGPT's recent challenges with its latest model being too agreeable, it's clear that there's not a straight-line path to improvement.
For now at least, it'll remain essential to double-check any results from a generative AI application like AI agents to ensure accuracy.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Apple's Best-Ever iPad Mini Just Matched Its Record-Low Price in Amazon's July 4th Sale
Apple's Best-Ever iPad Mini Just Matched Its Record-Low Price in Amazon's July 4th Sale

CNET

time23 minutes ago

  • CNET

Apple's Best-Ever iPad Mini Just Matched Its Record-Low Price in Amazon's July 4th Sale

The iPad Mini A17 Pro is the latest in Apple's long line of miniature tablets, and it's one of the best tablets you can buy -- especially if you're looking for a smaller, more portable model. It's also one of the most affordable tablets in Apple's lineup, but that doesn't mean it's what most people would call cheap. Thankfully, Amazon is offering this mini marvel for just $399 as part of its Fourth of July sale this weekend. But Apple iPads tend to sell out quickly on sale, so be sure to snag it now before stock runs out. For a limited time, this deal brings Apple's iPad Mini back down to its all-time low of $399. There are multiple colors to choose from, and the same discount can be had on different storage capacities. You can even save on the version with cellular connectivity, too. Prefer to pick your new tablet up from a store? Best Buy is matching the $399 price, too. Despite being small, the current iPad Mini is powerful, thanks to its speedy A17 Pro chip. That chip also enables Apple Intelligence features, if that's your bag. Hey, did you know? CNET Deals texts are free, easy and save you money. This iPad starts with 128GB of storage and a stunning 8.3-inch Liquid Retina display. It also has fast Wi-Fi 6E connectivity and support for the fancy Apple Pencil Pro. There's a 12-megapixel ultrawide front camera that includes Center Stage for video calls and a 12MP wide back camera to help you scan documents clearly or take photos with your iPad. Keep in mind that these prices might not last for long, and the same can be said for stocks. If you want a particular model in a particular color, we suggest ordering sooner rather than later. Why this deal matters This deal cuts $100 off the price of the latest iPad Mini -- a rare discount considering this is one of the newest Apple tablets on the market. This tablet is fast yet small and light, and it might just be the perfect portable gaming device. If a new tablet is on your Memorial Day shopping list, don't miss out on this offer.

This Billionaire Tech Investor Regrets Not Buying Bitcoin Everyday, Believes The Asset Can More Than Double In Value
This Billionaire Tech Investor Regrets Not Buying Bitcoin Everyday, Believes The Asset Can More Than Double In Value

Yahoo

time28 minutes ago

  • Yahoo

This Billionaire Tech Investor Regrets Not Buying Bitcoin Everyday, Believes The Asset Can More Than Double In Value

Benzinga and Yahoo Finance LLC may earn commission or revenue on some items through the links below. Billionaire technology investor and Coatue Management CEO Philippe Laffont regrets not buying Bitcoin. Bitcoin is in Laffont's 'Fantastic 40.' It is not the first time Laffont has expressed pro-Bitcoin sentiments. Despite their storied successes, famed investors don't always get it right. 'I wake up every day at three in the morning and I'm like, why am I such an idiot?' Billionaire technology investor and Coatue Management CEO Philippe Laffont told CNBC on Wednesday, expressing regret at not buying Bitcoin. 'What have I been waiting for not being involved in it [Bitcoin]?' Like many sophisticated investors, Bitcoin's volatility turned Laffont off. 'I always thought bitcoin is amazing, but it's double or triple the volatility of the Nasdaq,' he told CNBC. 'Nasdaq is already pretty volatile. Why do I need to deal with this added volatility?' Don't Miss: — no wallets, just price speculation and free paper trading to practice different strategies. Grow your IRA or 401(k) with Crypto – . But amid Bitcoin's staying power and continued surge, Laffont now believes he was mistaken. He added that the volatility that initially dissuaded him from investing in the asset now appears to be coming down, citing the asset's reaction to President Donald Trump's April 2 'Liberation Day' tariffs. While the Nasdaq fell over 16% in the days following the news, Bitcoin fell only about 12%. Laffont has added Bitcoin to his 'Fantastic 40,' a list of firms and assets he believes will lead the market by 2030. He predicts the asset's market capitalization could hit $5 trillion by then. The target suggests a 138% upside potential for the asset, which currently boasts a market capitalization of $2.1 trillion. Trending: New to crypto? on Coinbase. Laffont reasons that Bitcoin still has room to grow, citing its size relative to the world's market cap. 'Net worth of the world is, I think, $450 trillion to $500 trillion, equities are, let's say, $120 trillion, gold above and under the ground is $20 trillion. And then Bitcoin is $2 trillion. And I was like, okay, well, $2 trillion – let's say it represents half a percent of the net worth of the world. Could it go to one of two?' he told CNBC. Laffont also said Bitcoin could benefit from de-dollarization. The theory is that U.S. exceptionalism may be ending, paving the way for a future with multiple reserve currencies. Many analysts see Bitcoin benefiting from such a scenario. Despite this optimistic outlook on Bitcoin, Laffont is still yet to commit. 'Do I own it now? Do I own it tomorrow or in a few days? But every day, I do think, 'Why do I not own it?'' Laffont said. 'Sometimes you have to change your mind and you have to say, well, I made a mistake.'Laffont had previously expressed these pro-Bitcoin sentiments on June 12. At the time, he urged proponents to lean towards moderation. 'For those of you that think Bitcoin is going to be important, my recommendation is never make it such a big portion of your portfolio that it becomes the driving factor of the portfolio,' he said at Coinbase's (NASDAQ:COIN) State of Crypto Summit in New York. 'You're going to make way more money by having a smaller position that you can keep for 10 years than the big one that worries you all the time.' Read Next: Named a TIME Best Invention and Backed by 5,000+ Users, Kara's Air-to-Water Pod Cuts Plastic and Costs — Image: Shutterstock This article This Billionaire Tech Investor Regrets Not Buying Bitcoin Everyday, Believes The Asset Can More Than Double In Value originally appeared on

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store