4 days ago
Artificial Intelligencer: Why AI's math gold wins matter
July 24 (Reuters) - This was originally published in the Artificial Intelligencer newsletter, which is issued every Wednesday. Sign up here to learn about the latest breakthroughs in AI and tech.
At the Reuters Momentum AI conference, opens new tab in Silicon Valley last week, I heard two phrases over and over from Fortune 500 executives: "human in the loop" and "flat is the new up."
They reflect a cautious but ambitious strategy: While nearly every company still keeps humans working alongside AI, the early impact is already showing that companies are growing revenue without hiring more people.
What's changed? The nature of work within organizations. The first cuts are already hitting outsourced labor. Employees are shifting to higher-value work, such as handling complicated tasks and reviewing AI's output. Revenue per head is on the rise, or as some say, 'flat headcount is the new up.'
Despite the narrative that 2025 will be the year of the AI agent, truly agentic workflows still seem distant for complex use cases. In fact, some executives still view AI models as just pattern matchers, not true reasoners.
Researchers at Google and OpenAI would beg to differ, as I learned after speaking with them following both labs' gold medal wins at this year's International Mathematical Olympiad. I believe this is an exciting milestone for the reasoning paradigm that AI models are striving to advance. Scroll down to read why this matters.
Email me at opens new tab or follow me on LinkedIn, opens new tab to share any feedback, and what you want to read about next in AI.
Our latest reporting in Tech & AI
Exclusive-Blackstone drops out of group bid for TikTok US
White House to unveil plan to push US AI abroad, crackdown on US AI rules
Trump administration seeks pathway for US companies to export AI chips
Nvidia CEO's China charm offensive underscores rock star status in key market
AI models with systemic risks given pointers on how to comply with EU AI rules
TSMC posts record quarterly profit on AI demand, but wary about tariffs
How AI won math gold
AI crossed a threshold that even caught the best researchers by surprise. For the first time, an AI from Google DeepMind won a gold medal at the International Mathematical Olympiad, the world's most elite high school math competition.
OpenAI, which did not officially participate in this year's IMO, said its model also achieved gold-medal performance, based on solutions graded by external experts using IMO guidelines.
While it's tempting to see this as just another headline in AI's relentless march, I spent time speaking with the minds behind these models—some of whom are former IMO medalists themselves—to understand how we got here and what these wins reveal about the frontier of AI.
The main takeaway? The reasoning abilities demonstrated by models like DeepMind's Gemini Pro and OpenAI's o1 series have endless possibilities. This win is also a testament to the classic recipe for model improvement: high-quality data and huge amounts of compute.
While neither lab revealed the full details of their methods, both demonstrated the power of thinking for longer. Since last year, top AI labs have shifted focus from scaling up pre-training and increasing model sizes to using test-time compute to give models more 'thinking time'.
OpenAI described how its model tackled each problem dozens of times simultaneously, using consensus and multi-agent strategies to aggregate the best solutions. DeepMind, meanwhile, employed its 'Deep Think' technique, enabling Gemini to explore many solution paths at the same time, synthesize ideas, and generate rigorous, human-readable proofs.
In what researchers dubbed a 'paradigm shift,' DeepMind's AI has gone from needing expert human translation just a year ago to solving five of six IMO problems in natural language this week.
This breakthrough directly challenges the long-held skepticism that AI models are just clever mimics, predicting the next word. Math, requiring multi-step, creative proofs, has become the ultimate test of true reasoning, and AI just passed.
We don't know exactly how much parallel computation went into solving each question, but OpenAI told us it was 'very expensive.' After all, the models were given about 4.5 hours—just like human contestants—to work through each set.
This highlights how today's most intelligent models demand vast compute resources, helping explain AI labs' insatiable appetite for chips like Nvidia's GPUs. And as these methods expand into other domains—coding, science, creative writing—the computational demands will continue to grow.
Both labs also credit their breakthroughs to high-quality data: step-by-step, annotated proofs, not just final answers. DeepMind, in particular, pointed to new reinforcement learning techniques that reward not just correctness, but the elegance and clarity of a proof.
So what does this mean for the future? The 'can AI reason?' debate may be settled—at least for domains as challenging as Olympiad mathematics. The ever-growing emergence of true thinking capabilities inside AI models has the potential to transform many domains as researchers crack the code on math and move on to new frontiers.
DeepMind is already working to put its system in the hands of mathematicians and, soon, the wider public. OpenAI says it's using what it's learned from this model to train others, but this particular capability won't be included in the upcoming GPT-5 release this summer.
Chart of the week
You're probably reading this AI newsletter because you're already an AI user, which will put you in the basket of 61% of Americans who have welcomed AI into their lives. The rest, a solid 39%, remain unconvinced, according to a report from Menlo Ventures, opens new tab.
The top blocker? Good old-fashioned human connection. About 80% of non-adopters say they'd rather deal with a person than a machine, especially for important decisions. In fact, 53% say they want accountability and oversight from another human, not just a digital assistant who always gives instant responses.
Other top hurdles include data privacy worries (71%), skepticism about AI's usefulness (63%), and a healthy distrust of the information AI serves up (58%). So, while the bots may be ready, the humans are holding out for more trust, transparency, and—let's face it—a bit more humanity.