logo
Google rolls out Gemini Deep Think AI, a reasoning model that tests multiple ideas in parallel

Google rolls out Gemini Deep Think AI, a reasoning model that tests multiple ideas in parallel

Yahoo2 days ago
Google DeepMind is rolling out Gemini 2.5 Deep Think, which, the company says, is its most advanced AI reasoning model, able to answer questions by exploring and considering multiple ideas simultaneously and then using those outputs to choose the best answer.
Subscribers to Google's $250-per-month Ultra subscription will gain access to Gemini 2.5 Deep Think in the Gemini app starting Friday.
First unveiled in May at Google I/O 2025, Gemini 2.5 Deep Think is Google's first publicly available multi-agent model. These systems spawn AI multiple agents to tackle a question in parallel, a process that uses significantly more computational resources than a single agent, but tends to result in better answers.
Google used a variation of Gemini 2.5 Deep Think to score a gold medal at this year's International Math Olympiad (IMO).
Alongside Gemini 2.5 Deep Think, the company says it is releasing the model it used at the IMO to a select group of mathematicians and academics. Google says this AI model 'takes hours to reason,' instead of seconds or minutes like most consumer-facing AI models. The company hopes the IMO model will enhance research efforts, and aims to get feedback on how to improve the multi-agent system for academic use cases.
Google notes that the Gemini 2.5 Deep Think model is a significant improvement over what it announced at I/O. The company also claims to have developed 'novel reinforcement learning techniques' to encourage Gemini 2.5 Deep Think to make better use of its reasoning paths.
'Deep Think can help people tackle problems that require creativity, strategic planning and making improvements step-by-step,' said Google in a blog post shared with TechCrunch.
The company says Gemini 2.5 Deep Think achieves state-of-the-art performance on Humanity's Last Exam (HLE) — a challenging test measuring AI's ability to answer thousands of crowdsourced questions across math, humanities, and science. Google claims its model scored 34.8% on HLE (without tools), compared to xAI's Grok 4, which scored 25.4%, and OpenAI's o3, which scored 20.3%.
Google also says Gemini 2.5 Deep Think outperforms AI models from OpenAI, xAI, and Anthropic on LiveCodeBench6, a challenging test of competitive coding tasks. Google's model scored 87.6%, whereas Grok 4 scored 79%, and OpenAI's o3 scored 72%.
Gemini 2.5 Deep Think automatically works with tools such as code execution and Google Search, and the company says it's capable of producing 'much longer responses' than traditional AI models.
In Google's testing, the model produced more detailed and aesthetically pleasing web development tasks compared to other AI models. The company claims the model could aid researchers and 'potentially accelerate the path to discovery.'
It seems that several leading AI labs are converging around the multi-agent approach.
Elon Musk's xAI recently released a multi-agent system of its own, Grok 4 Heavy, which it says was able to achieve industry leading performance on several benchmarks. OpenAI researcher Noam Brown said on a podcast that the unreleased AI model the company used to achieve a gold medal at this year's International Math Olympiad (IMO) was also a multi-agent system. Meanwhile, Anthropic's Research agent, which generates thorough research briefs, is also powered by a multi-agent system.
Despite the strong performance, it seems that multi-agent systems are even costlier to serve than traditional AI models. That means tech companies may keep these systems gated behind their most expensive subscription plans, which xAI and now Google have chosen to do.
In the coming weeks, Google says it plans to share Gemini 2.5 Deep Think with a select group of testers via the Gemini API. The company says it wants to better understand how developers and enterprises may use its multi-agent system.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

1 AI Robotics Stock to Buy Before It Soars 758% to $8 Trillion, According to a Wall Street Analyst
1 AI Robotics Stock to Buy Before It Soars 758% to $8 Trillion, According to a Wall Street Analyst

Yahoo

timean hour ago

  • Yahoo

1 AI Robotics Stock to Buy Before It Soars 758% to $8 Trillion, According to a Wall Street Analyst

Key Points Several Wall Street experts anticipate substantial upside in Tesla stock as the company leans into autonomous driving and robotics. Tesla reported dismal first-quarter financial results as increased competition and CEO Elon Musk's political activities eroded its market share. Musk believes Tesla will eventually dominate the trillion-dollar robotaxi market, and he sees a $10 trillion opportunity in humanoid robots. These 10 stocks could mint the next wave of millionaires › Tesla (NASDAQ: TSLA) shares have declined 25% year to date as the electric carmaker has struggled with weak demand amid growing competition and consumer backlash against CEO Elon Musk's politics. The company is currently worth $976 billion, but several Wall Street experts anticipate substantial upside in the years ahead. Ark Invest analysts, led by Tasha Keeney, think Tesla stock will reach $2,600 per share by 2029. That forecast implies 758% upside from its current share price of $303. It also implies a market value of $8.3 trillion. Wedbush analyst Dan Ives recently told Yahoo Finance that Tesla could be a $2 trillion company within 12 months. That implies 105% upside from its current market value of $976 billion. It also implies a share price of $620. Hedge fund billionaire Ron Baron told CNBC last year that Tesla could be a $5 trillion company within a decade. That implies 410% upside from its current market value. It also implies a share price of $1,550. CEO Elon Musk has said Tesla could eventually be a $30 trillion company as it benefits from autonomous driving and robotics. That implies 2,975% upside from its current market value. It also implies a share price of $9,310. Tesla is one of the most controversial stocks on the market. Investors tend to have binary opinions, either seeing Tesla as an overrated automaker or a revolutionary company poised to reshape the global mobility and labor markets with artificial intelligence. Read on to learn more. Tesla is losing market share in electric vehicles, and Musk warned of rough quarters ahead Tesla ceded significant market share in electric vehicles during the past year as competition increased and CEO Elon Musk damaged the brand with his political activities. The company accounted for just 10% of battery electric vehicle sales through May, down from 16% in the same period last year, according to Morgan Stanley. Tesla reported weak second-quarter financial results. Deliveries decreased by 13%, the second straight drop. Revenue declined 12% to $22 billion, operating margin narrowed by 2 percentage points, and non-GAAP (generally accepted accounting principles) earnings fell 23% to $0.40 per diluted share. Musk also warned that the next few quarters could be rough as the company ramps up its autonomous driving business. "We probably could have a few rough quarters. I'm not saying we will, but we could," he told analysts on the earnings call. "But once we get autonomous to scale in the second half of next year, certainly by the end of next year, I'd be really surprised if the economics are not very compelling." Tesla has substantial opportunities in autonomous ride-hailing services and humanoid robots Tesla has been developing its autonomous driving software for more than a decade. Its vision-only approach (meaning its cars are equipped only with cameras) gives the company a theoretical edge over the market leader Alphabet's Waymo, which relies on a more costly array of cameras, lidar, and radar. Tesla also has more camera-equipped cars on the road collecting data to train the underlying artificial intelligence (AI) models. Importantly, while Waymo is currently the market leader, with commercial autonomous ride-hailing services in five U.S. cities, Elon Musk thinks Tesla will catch up quickly because its vision-only strategy is more scalable. Indeed, the company recently started its first robotaxi service in Austin, but Musk says the coverage area could include half the U.S. population by year-end. Additionally, Musk says Tesla could eventually have 99% market share in autonomous ride-hailing, which itself is forecast to be a trillion-dollar market in about 15 years. Tom Narayan at RBC Capital expects global robotaxi revenue to reach $1.7 trillion by 2040. He also says Tesla could earn $115 billion in revenue from robotaxi services in that year. Beyond robotaxis, Tesla is also developing an autonomous humanoid robot, called Optimus, to revolutionize the labor industry. Robots could be particularly useful in handling tasks too dangerous, tedious, or physically demanding for humans. Musk says Optimus production will hit 100,000 units monthly (more than 1 million annually) within five years. He also says humanoid robots could be a $10 trillion opportunity for Tesla. The Ark Invest analysts, led by Tasha Keeney, built their 2029 forecast around autonomous driving. Robotaxis are projected to account for more than 60% of revenue, roughly $750 billion, while electric car sales account for less than 30%. The remaining portion will come from energy storage and insurance. Keeney did not factor Optimus into the calculations, but her robotaxi estimates are much more aggressive than those from Narayan at RBC. Tesla's valuation looks absurdly expensive, but autonomous driving and robotics could change the narrative Wall Street estimates Tesla's earnings will increase by 20% annually over the next three to five years. That makes the current valuation of 175 times earnings look absurdly expensive. But Tesla bulls think most analysts are underestimating the impact that robotaxis and robots will have on the business. For instance, Ark Invest estimates that Tesla's earnings before interest, taxes, depreciation, and amortization (EBITDA) will increase by over 3,000% to $440 billion by 2029, which implies a compound annual growth rate of about 115%. While I find that scenario highly unlikely, earnings growth of that magnitude would justify the current valuation. Here's the bottom line: Traders who lack confidence in the robotaxi and robotics narrative should avoid this stock. But patient investors who believe Tesla could revolutionize the mobility and labor markets with AI products like self-driving cars and humanoid robots should own a position. Should you buy stock in Tesla right now? The Motley Fool Stock Advisor analyst team just identified what they believe are the for investors to buy now. The 10 stocks that made the cut could produce monster returns in the coming years. Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you'd have $624,823!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you'd have $1,064,820!* Now, it's worth noting Stock Advisor's total average return is 1,019% — a market-crushing outperformance compared to 178% for the S&P 500. Don't miss out on the latest top 10 list, available when you join Stock Advisor. See the 10 stocks » *Stock Advisor returns as of July 29, 2025 Trevor Jennewine has positions in Tesla. The Motley Fool has positions in and recommends Alphabet and Tesla. The Motley Fool has a disclosure policy. 1 AI Robotics Stock to Buy Before It Soars 758% to $8 Trillion, According to a Wall Street Analyst was originally published by The Motley Fool Sign in to access your portfolio

Is Apple getting ready to launch a PlayStation and Xbox competitor?
Is Apple getting ready to launch a PlayStation and Xbox competitor?

Yahoo

timean hour ago

  • Yahoo

Is Apple getting ready to launch a PlayStation and Xbox competitor?

The Apple TV is probably my favorite device that Apple makes. While the Apple TV app is in dire need of some basic improvements, the hardware box itself is a standout—especially compared to competitors like Amazon's Fire TV and Roku's streaming devices. This is largely thanks to the stellar Siri Remote, which makes navigating the device with your fingers or voice a cinch, and the powerful Apple silicon chip inside that makes the Apple TV's operating system, tvOS, run buttery smooth. Other countries are stepping up after Trump pulled the U.S. out of the climate fight Emotionally intelligent people use the 2-week rule to motivate themselves and reach their biggest goals Exclusive: Google is indexing ChatGPT conversations, potentially exposing sensitive user data However, when it comes to being a device meant to sit at the center of your living room as an all-encompassing entertainment hub, the Apple TV is lacking in one big department: gaming. The Apple TV is technically a gaming console, since it can play rudimentary games and supports third-party console controllers. But no one is likely to replace their PlayStation or Xbox with one any time soon, because the current Apple TV lacks the processing power to run console-quality games. Yet perhaps that could be changing. Recently, I've noticed that Apple has been making moves that suggest the company may be on the cusp of turning the Apple TV into a full-blown PlayStation and Xbox competitor. Doing so would open up another potential billion-dollar revenue stream for the company. The new Apple Games app is currently MIA from tvOS 26 At Apple's Worldwide Developers Conference (WWDC) this year, the company unveiled a new cross-device app called Apple Games. The app acts as a central hub and launcher for all the games you've ever bought on Apple's App Store or have access to via the company's Apple Arcade subscription service. The Apple Games app also gives you quick access to game events and challenges, and helps you discover new games to play and see what games your friends are playing. In other words, the new Apple Games app is similar to the PlayStation 5 Game Hub and the Xbox Dashboard—the interfaces on the consoles that significantly differentiate the living room gaming experience from PCs. Apple announced that Apple Games is coming to the iPhone, iPad, and Mac with iOS 26, iPadOS 26, and macOS 26 this fall. But the new app is conspicuously absent from the Apple TV's next operating system, tvOS 26, which also ships this fall. This is a notable omission, especially considering that Apple markets its Apple Arcade gaming service as a core feature of the Apple TV experience. It also offers thousands of mobile-level games through the tvOS App Store. The more I think about the Apple Games omission from tvOS 26, the more it makes sense—if Apple is set to turn the Apple TV into a true gaming console but doesn't want anyone to know it yet. The next Apple TV is rumored to have two key hardware improvements essential to top-line gaming consoles Apple doesn't update the Apple TV as often as it does iPhones or even its iPads. Typically, years pass between Apple TV updates. The most recent Apple TV, the Apple TV 4K, was last updated in November 2022, nearly three years ago. That means it's ripe for an update this year. Rumors suggest that a new Apple TV is indeed coming later this year and that it will feature two significant hardware upgrades—ones that would enable it to become a true gaming console. The first is an updated chipset. The current Apple TV 4K features the Apple A15 Bionic chip, the same one found in the iPhone 13 from 2021. Most people expect the next Apple TV to get a significant upgrade—perhaps to the A18 or A18 Pro, found in the current iPhone 16 and iPhone 16 Pro series, or perhaps even the unreleased A19 chip, which will go into this year's iPhone 17 series. It's also possible Apple could put the M1 or M2 chip, previously found in Macs and iPads, into the new Apple TV. This jump from the A15 to the A18, A19, M1, or M2 would give the Apple TV the performance boost it needs to run AAA console games, such as the Resident Evil series from Capcom, which are currently capable of running (with controller support, no less) on the iPhone 15 Pro, thanks to its A17 Pro chip. Another upgrade the next Apple TV is expected to get is a new Apple-designed Wi-Fi and Bluetooth chipset that will support the Wi-Fi 7 standard (via MacRumors). This standard offers lower latency and faster Wi-Fi speeds than the current Wi-Fi 6 standard—something critical for gaming consoles and the bandwidth-hungry games that stream to them. The leading games console, the PlayStation 5 Pro, currently offers Wi-Fi 7 support. In other words, the hardware components Apple needs to turn the next Apple TV into a PlayStation and Xbox competitor are all in the pipeline. And, increasingly, so is something else the Apple TV would need to become a true gaming console: increasing commitment to Apple's platforms from major games studios. More AAA games are hitting the Mac—and iPhone—than ever before In the video game industry, the top games are known as AAA (triple-A) titles. These are the games with the most advanced graphics and the biggest budgets, and are frequently the highlights of the console gaming experience. Historically, AAA game developers have shied away from releasing their major titles on the Mac (the Apple device with the hardware power most comparable to professional gaming consoles). But in the past year, that's changed a lot, thanks to Apple's move to make game development on the Mac easier and more cost-effective than ever, thanks to tools like the company's Game Porting Toolkit 3 and the hardware-accelerated graphics API, Metal 4, which makes graphics-intensive games look better on Mac and iPhone. Considering Apple devices are more popular than ever, game studios stand to financially benefit by bringing their biggest titles to Apple's platforms and their millions of users. In July alone, two major AAA titles made their debut on the Mac: CD Projekt Red's Cyberpunk 2077: Ultimate and Deep Silver's Dead Island 2. Other major AAA titles have also been released on the Mac over the past few years, including Ubisoft's Assassin's Creed: Shadows and Prince of Persia: The Lost Crown, Remedy's Control Ultimate Edition, Kojima Productions' Death Stranding Director's Cut, Round 8 Studio's Lies of P, 11 Bit Studios' Frostpunk 2, and Capcom's Resident Evil series remastered editions. Additionally, more AAA titles are coming to the Mac this year, including IO Interactive's Hitman World of Assassination, InZOI Studio's InZOI, and Pearl Abyss' Crimson Desert. Most of these games require an M1 series chip or later, found in the company's Apple Silicon Macs released since 2020. Some, like the Resident Evil series, can even run on the A17 Pro and later, first introduced in 2023. Apple's current A18 Pro is roughly equivalent to the M1 in terms of performance, and if Apple puts it, the M1 or M2, or the upcoming A19 Pro, inside the next Apple TV, as expected, there is no reason these AAA games that currently run on the Mac couldn't run on the new Apple TV. And if that happens, the Apple TV becomes a professional-level gaming console. Turning the Apple TV into a gaming console makes sense for Apple's ecosystem and the company's bottom line When Apple announced the upcoming Apple Games app for all its devices except the Apple TV, it stood out as a glaring hole in the company's lineup, especially since Apple Games is a natural fit for the Apple TV. But when you take in the odd omission, along with recent rumors that the next Apple TV is set to get powerful new CPU and wireless chipsets, and the flood of new AAA titles hitting the Mac and iPhone this year, things start to look a lot clearer. Yet something else leads me to believe that Apple could be turning the Apple TV into a gaming console this year: the company's history of being unwilling to let software announcements spoil new hardware features. In the past, Apple has withheld software announcements at WWDC to avoid revealing upcoming hardware improvements to its devices. The AAA titles available on the Mac appear in Apple Games on the macOS 26 beta. If Apple had previewed Apple Games on the tvOS 26 beta, Mac games that run on the new unreleased Apple TV, including these AAA titles, might have also shown there. That would spoil a major, as-yet-unannounced feature for the as-yet-unannounced Apple TV. Of course, all this is just conjecture on my part. Still, all the signs seem to be pointing to Apple TV becoming a true gaming console. This would make a lot of business sense for Apple. At price points of $129 or $149, depending on whether you want more storage and an ethernet connection, the current Apple TV 4K is much more expensive than such competitors as the Roku Streaming Stick 4K ($49), the Roku Ultra ($99), and the Amazon Fire TV Stick 4K ($49). However, if Apple gives the new Apple TV gaming console capabilities, the current $129/$149 price suddenly looks like a bargain. A triple-A gaming experience on the Apple TV would be a unique selling point that Roku or Amazon couldn't compete with. It could also give Apple a major new revenue stream in the form of 30% App Store commissions on AAA titles sold through the tvOS App Store. As of 2024, the global AAA gaming market is valued at approximately $75 billion annually, according to a July 2025 Business Research Insights report. It's expected to grow to nearly $108 billion by 2033. But most of all, a new Apple TV with console gaming capabilities would further solidify the device as the digital heart of the living room and smart home, giving users another reason to stay within Apple's ecosystem, both inside and outside the house—an ancillary benefit Apple likely finds invaluable. This post originally appeared at to get the Fast Company newsletter:

Inside OpenAI's quest to make AI do anything for you
Inside OpenAI's quest to make AI do anything for you

Yahoo

time2 hours ago

  • Yahoo

Inside OpenAI's quest to make AI do anything for you

Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of the fastest-growing products ever. Meanwhile, Lightman quietly worked on a team teaching OpenAI's models to solve high school math competitions. Today that team, known as MathGen, is considered instrumental to OpenAI's industry-leading effort to create AI reasoning models: the core technology behind AI agents that can do tasks on a computer like a human would. 'We were trying to make the models better at mathematical reasoning, which at the time they weren't very good at,' Lightman told TechCrunch, describing MathGen's early work. OpenAI's models are far from perfect today — the company's latest AI systems still hallucinate and its agents struggle with complex tasks. But its state-of-the-art models have improved significantly on mathematical reasoning. One of OpenAI's models recently won a gold medal at the International Math Olympiad, a math competition for the world's brightest high school students. OpenAI believes these reasoning capabilities will translate to other subjects, and ultimately power general-purpose agents that the company has always dreamed of building. ChatGPT was a happy accident — a lowkey research preview turned viral consumer business — but OpenAI's agents are the product of a years-long, deliberate effort within the company. 'Eventually, you'll just ask the computer for what you need and it'll do all of these tasks for you,' said OpenAI CEO Sam Altman at the company's first developer conference in 2023. 'These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.' Whether agents will meet Altman's vision remains to be seen, but OpenAI shocked the world with the release of its first AI reasoning model, o1, in the fall of 2024. Less than a year later, the 21 foundational researchers behind that breakthrough are the most highly sought-after talent in Silicon Valley. Mark Zuckerberg recruited five of the o1 researchers to work on Meta's new superintelligence-focused unit, offering some compensation packages north of $100 million. One of them, Shengjia Zhao, was recently named chief scientist of Meta Superintelligence Labs. The reinforcement learning renaissance The rise of OpenAI's reasoning models and agents are tied to a machine learning training technique known as reinforcement learning (RL). RL provides feedback to an AI model on whether its choices were correct or not in simulated environments. RL has been used for decades. For instance, in 2016, about a year after OpenAI was founded in 2015, an AI system created by Google DeepMind using RL, AlphaGo, gained global attention after beating a world champion in the board game, Go. Around that time, one of OpenAI's first employees, Andrej Karpathy, began pondering how to leverage RL to create an AI agent that could use a computer. But it would take years for OpenAI to develop the necessary models and training techniques. By 2018, OpenAI pioneered its first large language model in the GPT series, pretrained on massive amounts of internet data and a large clusters of GPUs. GPT models excelled at text processing, eventually leading to ChatGPT, but struggled with basic math. It took until 2023 for OpenAI to achieve a breakthrough, initially dubbed 'Q*' and then 'Strawberry,' by combining LLMs, RL, and a technique called test-time computation. The latter gave the models extra time and computing power to plan and work through problems, verifying its steps, before providing an answer. This allowed OpenAI to introduce a new approach called 'chain-of-thought' (CoT), which improved AI's performance on math questions the models hadn't seen before. 'I could see the model starting to reason,' said El Kishky. 'It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.' Though individually these techniques weren't novel, OpenAI uniquely combined them to create Strawberry, which directly led to the development of o1. OpenAI quickly identified that the planning and fact checking abilities of AI reasoning models could be useful to power AI agents. 'We had solved a problem that I had been banging my head against for a couple of years,' said Lightman. 'It was one of the most exciting moments of my research career.' Scaling reasoning With AI reasoning models, OpenAI determined it had two new axes that would allow it to improve AI models: using more computational power during the post-training of AI models, and giving AI models more time and processing power while answering a question. 'OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,' said Lightman. Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an 'Agents' team led by OpenAI researcher Daniel Selsam to make further progress on this new paradigm, two sources told TechCrunch. Although the team was called 'Agents,' OpenAI didn't initially differentiate between reasoning models and agents as we think of them today. The company just wanted to make AI systems capable of completing complex tasks. Eventually, the work of Selsam's Agents team became part of a larger project to develop the o1 reasoning model, with leaders including OpenAI co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jakub Pachocki. OpenAI would have to divert precious resources — mainly talent and GPUs — to create o1. Throughout OpenAI's history, researchers have had to negotiate with company leaders to obtain resources; demonstrating breakthroughs was a surefire way to secure them. 'One of the core components of OpenAI is that everything in research is bottom up,' said Lightman. 'When we showed the evidence [for o1], the company was like, 'This makes sense, let's push on it.'' Some former employees say that the startup's mission to develop AGI was the key factor in achieving breakthroughs around AI reasoning models. By focusing on developing the smartest-possible AI models, rather than products, OpenAI was able to prioritize o1 above other efforts. That type of large investment in ideas wasn't always possible at competing AI labs. The decision to try new training methods proved prescient. By late 2024, several leading AI labs started seeing diminishing returns on models created through traditional pretraining scaling. Today, much of the AI field's momentum comes from advances in reasoning models. What does it mean for an AI to 'reason?' In many ways, the goal of AI research is to recreate human intelligence with computers. Since the launch of o1, ChatGPT's UX has been filled with more human-sounding features such as 'thinking' and 'reasoning.' When asked whether OpenAI's models were truly reasoning, El Kishky hedged, saying he thinks about the concept in terms of computer science. 'We're teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning,' said El Kishky. Lightman takes the approach of focusing on the model's results and not as much on the means or their relation to human brains. 'If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that,' said Lightman. 'We can call it reasoning, because it looks like these reasoning traces, but it's all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.' OpenAI's researchers note people may disagree with their nomenclature or definitions of reasoning — and surely, critics have emerged — but they argue it's less important than the capabilities of their models. Other AI researchers tend to agree. Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning modes to airplanes in a blog post. Both, he says, are manmade systems inspired by nature — human reasoning and bird flight, respectively — but they operate through entirely different mechanisms. That doesn't make them any less useful, or any less capable of achieving similar outcomes. A group of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed in a recent position paper that AI reasoning models are not well understood today, and more research is needed. It may be too early to confidently claim what exactly is going on inside them. The next frontier: AI agents for subjective tasks The AI agents on the market today work best for well-defined, verifiable domains such as coding. OpenAI's Codex agent aims to help software engineers offload simple coding tasks. Meanwhile, Anthropic's models have become particularly popular in AI coding tools like Cursor and Claude Code — these are some of the first AI agents that people are willing to pay up for. However, general purpose AI agents like OpenAI's ChatGPT Agent and Perplexity's Comet struggle with many of the complex, subjective tasks people want to automate. When trying to use these tools for online shopping or finding a long-term parking spot, I've found the agents take longer than I'd like and make silly mistakes. Agents are, of course, early systems that will undoubtedly improve. But researchers must first figure out how to better train the underlying models to complete tasks that are more subjective. 'Like many problems in machine learning, it's a data problem,' said Lightman, when asked about the limitations of agents on subjective tasks. 'Some of the research I'm really excited about right now is figuring out how to train on less verifiable tasks. We have some leads on how to do these things.' Noam Brown, an OpenAI researcher who helped create the IMO model and o1, told TechCrunch that OpenAI has new general-purpose RL techniques which allow them to teach AI models skills that aren't easily verified. This was how the company built the model which achieved a gold medal at IMO, he said. OpenAI's IMO model was a newer AI system that spawns multiple agents, which then simultaneously explore several ideas, and then choose the best possible answer. These types of AI models are becoming more popular; Google and xAI have recently released state-of-the-art models using this technique. 'I think these models will become more capable at math, and I think they'll get more capable in other reasoning areas as well,' said Brown. 'The progress has been incredibly fast. I don't see any reason to think it will slow down.' These techniques may help OpenAI's models become more performant, gains that could show up in the company's upcoming GPT-5 model. OpenAI hopes to assert its dominance over competitors with the launch of GPT-5, ideally offering the best AI model to power agents for developers and consumers. But the company also wants to make its products simpler to use. El Kishky says OpenAI wants to develop AI agents that intuitively understand what users want, without requiring them to select specific settings. He says OpenAI aims to build AI systems that understand when to call up certain tools, and how long to reason for. These ideas paint a picture of an ultimate version of ChatGPT: an agent that can do anything on the internet for you, and understand how you want it to be done. That's a much different product than what ChatGPT is today, but the company's research is squarely headed in this direction. While OpenAI undoubtedly led the AI industry a few years ago, the company now faces a tranche of worthy opponents. The question is no longer just whether OpenAI can deliver its agentic future, but can the company do so before Google, Anthropic, xAI, or Meta beat them to it? Sign in to access your portfolio

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store