logo
Artificial Intelligencer: Why AI's math gold wins matter

Artificial Intelligencer: Why AI's math gold wins matter

Reutersa day ago
July 24 (Reuters) - This was originally published in the Artificial Intelligencer newsletter, which is issued every Wednesday. Sign up here to learn about the latest breakthroughs in AI and tech.
At the Reuters Momentum AI conference, opens new tab in Silicon Valley last week, I heard two phrases over and over from Fortune 500 executives: "human in the loop" and "flat is the new up."
They reflect a cautious but ambitious strategy: While nearly every company still keeps humans working alongside AI, the early impact is already showing that companies are growing revenue without hiring more people.
What's changed? The nature of work within organizations. The first cuts are already hitting outsourced labor. Employees are shifting to higher-value work, such as handling complicated tasks and reviewing AI's output. Revenue per head is on the rise, or as some say, 'flat headcount is the new up.'
Despite the narrative that 2025 will be the year of the AI agent, truly agentic workflows still seem distant for complex use cases. In fact, some executives still view AI models as just pattern matchers, not true reasoners.
Researchers at Google and OpenAI would beg to differ, as I learned after speaking with them following both labs' gold medal wins at this year's International Mathematical Olympiad. I believe this is an exciting milestone for the reasoning paradigm that AI models are striving to advance. Scroll down to read why this matters.
Email me at krystal.hu@tr.com, opens new tab or follow me on LinkedIn, opens new tab to share any feedback, and what you want to read about next in AI.
Our latest reporting in Tech & AI
Exclusive-Blackstone drops out of group bid for TikTok US
White House to unveil plan to push US AI abroad, crackdown on US AI rules
Trump administration seeks pathway for US companies to export AI chips
Nvidia CEO's China charm offensive underscores rock star status in key market
AI models with systemic risks given pointers on how to comply with EU AI rules
TSMC posts record quarterly profit on AI demand, but wary about tariffs
How AI won math gold
AI crossed a threshold that even caught the best researchers by surprise. For the first time, an AI from Google DeepMind won a gold medal at the International Mathematical Olympiad, the world's most elite high school math competition.
OpenAI, which did not officially participate in this year's IMO, said its model also achieved gold-medal performance, based on solutions graded by external experts using IMO guidelines.
While it's tempting to see this as just another headline in AI's relentless march, I spent time speaking with the minds behind these models—some of whom are former IMO medalists themselves—to understand how we got here and what these wins reveal about the frontier of AI.
The main takeaway? The reasoning abilities demonstrated by models like DeepMind's Gemini Pro and OpenAI's o1 series have endless possibilities. This win is also a testament to the classic recipe for model improvement: high-quality data and huge amounts of compute.
While neither lab revealed the full details of their methods, both demonstrated the power of thinking for longer. Since last year, top AI labs have shifted focus from scaling up pre-training and increasing model sizes to using test-time compute to give models more 'thinking time'.
OpenAI described how its model tackled each problem dozens of times simultaneously, using consensus and multi-agent strategies to aggregate the best solutions. DeepMind, meanwhile, employed its 'Deep Think' technique, enabling Gemini to explore many solution paths at the same time, synthesize ideas, and generate rigorous, human-readable proofs.
In what researchers dubbed a 'paradigm shift,' DeepMind's AI has gone from needing expert human translation just a year ago to solving five of six IMO problems in natural language this week.
This breakthrough directly challenges the long-held skepticism that AI models are just clever mimics, predicting the next word. Math, requiring multi-step, creative proofs, has become the ultimate test of true reasoning, and AI just passed.
We don't know exactly how much parallel computation went into solving each question, but OpenAI told us it was 'very expensive.' After all, the models were given about 4.5 hours—just like human contestants—to work through each set.
This highlights how today's most intelligent models demand vast compute resources, helping explain AI labs' insatiable appetite for chips like Nvidia's GPUs. And as these methods expand into other domains—coding, science, creative writing—the computational demands will continue to grow.
Both labs also credit their breakthroughs to high-quality data: step-by-step, annotated proofs, not just final answers. DeepMind, in particular, pointed to new reinforcement learning techniques that reward not just correctness, but the elegance and clarity of a proof.
So what does this mean for the future? The 'can AI reason?' debate may be settled—at least for domains as challenging as Olympiad mathematics. The ever-growing emergence of true thinking capabilities inside AI models has the potential to transform many domains as researchers crack the code on math and move on to new frontiers.
DeepMind is already working to put its system in the hands of mathematicians and, soon, the wider public. OpenAI says it's using what it's learned from this model to train others, but this particular capability won't be included in the upcoming GPT-5 release this summer.
Chart of the week
You're probably reading this AI newsletter because you're already an AI user, which will put you in the basket of 61% of Americans who have welcomed AI into their lives. The rest, a solid 39%, remain unconvinced, according to a report from Menlo Ventures, opens new tab.
The top blocker? Good old-fashioned human connection. About 80% of non-adopters say they'd rather deal with a person than a machine, especially for important decisions. In fact, 53% say they want accountability and oversight from another human, not just a digital assistant who always gives instant responses.
Other top hurdles include data privacy worries (71%), skepticism about AI's usefulness (63%), and a healthy distrust of the information AI serves up (58%). So, while the bots may be ready, the humans are holding out for more trust, transparency, and—let's face it—a bit more humanity.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Sears shuts California stores after revival efforts fall short
Sears shuts California stores after revival efforts fall short

Daily Mail​

time19 minutes ago

  • Daily Mail​

Sears shuts California stores after revival efforts fall short

Sears' once-mighty empire is set to shrink even further as it shuts another store in California. The closure of the Burbank store in Los Angeles on August 31 will leave just six Sears stores in the US, down from more than 3,500 two decades ago. The Burbank store has struggled with low shopper numbers since opening in 2023. Soon after it opened, visitors branded it the 'saddest thing I've ever seen' with empty shelves and stained carpet. News of the closure comes as liquidation sales continue at another nearby Sears in Whittier, a city about 20 miles south-east of LA. It will close Saturday after nearly three decades. Business began crumbling in 2010, and by 2017, Searshad only only 695 locations. The vast majority have now shuttered since the company filed for bankruptcy in October 2018. Sears was on the brink of closing all its stores for good before CEO Eddie Lampert (pictured) submitted a last-minute bid valued at $4.4 billion in 2018. He upped the bid to $5.2 billion and saved 400 stores. Over the past year, Sears closed its last Washington store, shuttered its only remaining New York location. 'It's a landmark, it's something you grew up with, it's something you could trust,' Barbie Talamante, a former Sears staffer, said of the closure in Tukwila, a city ten miles south of Seattle, WA. Discount stores and big box retailers like Walmart have siphoned off Sears' customers over the last several years. Sears is not the only retailer in danger of closing for good after years of declining customers and sales. JCPenney has also been under fires. In 2020, it closed 30 percent of its stores after filing for bankruptcy. The chain had been operating 846 stores before the pandemic caused severe financial damage. It shuttered seven stores in May, and will lay off hundreds of staffers by closing a warehouse in Texas.

From Krispy Kreme to GoPro, has meme-stock trading frenzy returned?
From Krispy Kreme to GoPro, has meme-stock trading frenzy returned?

The Guardian

time19 minutes ago

  • The Guardian

From Krispy Kreme to GoPro, has meme-stock trading frenzy returned?

Shares in struggling retailers and ageing consumer brands surged, as amateur traders cast aside Wall Street's skepticism and mobilized online. It's like 2021 all over again. But the latest meme-stock rally could be even bigger than its predecessor four years ago, when investors piled into recognizable but unloved stocks, such as the video games retailer GameStop and the movie theatre chain AMC, according to the founder of the Reddit forum that helped whip up the frenzy. Retailer Kohl's, camera firm GoPro, fast-food chain Wendy's and doughnut chain Krispy Kreme each staged rapid rallies this week, driven by abrupt surges in trading volume reminiscent of the the meme-stock craze of 2021, when social media memes boosted a collection of struggling stocks, triggering extraordinary and volatile leaps in value. Actress Sydney Sweeney helped bring clothing retailer American Eagle Outfitters into the mania after it was announced the Euphoria and White Lotus star would front the brand's latest marketing campaign. The company's shares surged about 10% in trading on Thursday. Meme stocks are 'about to leap-frog in size and scope and scale, so that retail traders are going to redefine what matters', according to Jaime Rogozinski, founder of the wallstreetbets Reddit forum behind many of the volatile rallies. 'The world of finance is clearly changing, with blockchain technologies encroaching, and AI agents that trade on their own,' he said. 'And the collective of retail traders is adapting along with it.' Rogozinski founded wallstreetbets in 2012, but said Reddit ousted him as a moderator in 2020. His bid to sue the social media company for trademark infringement was dismissed by the US court of appeals for the ninth circuit last month. The forum's users home in on stocks and share their own research. 'It's a decentralization of power of who can be financial analyst,' said Noor Al, a moderator on wallstreetbets. 'Great ideas can now come from anyone, anywhere. 'We're seeing the power of retail push stocks, sometimes to the tune of billions of dollars, through the power of ideas, the power of community and the power of the people,' he added. The meme-stock craze of 2021, which produced stars such as Roaring Kitty, was a product of the Covid era, when many amateur traders were stuck at home and flush with pandemic stimulus cash. Whether this latest frenzy produces similar winners is not yet clear. Kohl's finished the week up 32%, GoPro was up 66% and Krispy Kreme was up 41%. The rallies show some investors are willing to take on more risk, as stocks scale record highs and the market, dominated by big tech, becomes harder to beat. Often, meme-stock bets are unbound from economic fundamentals, as investors move to support a brand for romantic or ideological reasons. Donald Trump's Trump Media & Technology Group, home to Truth Social, is valued at more than $5bn on quarterly revenue of about $1m. The wallstreetbets ethos 'has always to some extent been about flaunting and exploiting the ironies, relevance or irrelevance' of the stock market, said Rogozinski, who pointed to Wendy's, the hamburger chain, as a good example. 'Wendy's has always been a meme that goes back a decade. It brings a smile to my face, because on Reddit there's always been this thing where they say: 'Sir, this is a Wendy's.' 'It's an inside joke, and I don't even get where it started. It's just a meme,' he added. The stock's fleeting rise – it rallied 10% in two days, but finished the week broadly flat – shows some retail investors do not necessarily care about the typical factors that drive the market, such as tariffs and war in the Middle East. 'It's this ability for us to almost make fun of the financial system.' Sign up to Business Today Get set for the working day – we'll point you to all the business news and analysis you need every morning after newsletter promotion Long-term institutional players will always get the last laugh, Rogozinski conceded, because prices will return to normal valuations. 'But in the short term there's lot of money to be had with this volatility, and the fact that stocks are able to move up and down with such ease is but a mere showcase for how the financial system needs a facelift in relevancy.' While current market conditions do not replicate the low interest rates and retail investor buoyancy of the Covid era, market records and a robust economy have made meme stocks attractive once again for some. 'You see all these indications where this is full-blown meme mania,' Brent Kochuba, founder of derivatives-data firm SpotGamma, told Bloomberg. 'The macro economic environment really favors the retail and speculative plays,' agreed Al. 'I think were only going to see more speculation and excitement. It's a good time to tune in, because retail players can react and provide insight faster.' Days traders are not necessarily bothered by a company's financial performance, said Rogozinski. 'You have this activist, elective investor who is saying, 'I don't care what the financial statements look like, I don't care what the discounted cashflow is, I like the food, I like the video-game store, I like the meme. So dude, you can go back to Excel spreadsheets if you want, but I really like the chicken tenders,'' he said. There is now a 'third component' to investment, beyond supply and demand, he claimed, 'which is, 'dude, I don't care if you think it's going to go up or not, or if they have assets or liabilities. I care about this company and I'm going to help it out. I'm going to go buy my jeans from American Eagle.''

Feeling flush? Americans can Venmo government to help pay off US debt
Feeling flush? Americans can Venmo government to help pay off US debt

The Guardian

timean hour ago

  • The Guardian

Feeling flush? Americans can Venmo government to help pay off US debt

John F Kennedy's sage words from his inaugural address are forever seared into America's political consciousness: 'Ask not what your country can do for you – ask what you can do for your country.' Six decades and some change later, the United States Treasury is keeping Kennedy's spirit alive by offering Americans with a few dollars collecting dust in their Venmo balance a chance to fulfill a new patriotic duty: helping pay off the national debt. The US treasury department has long had a 'Gifts to Reduce the Public Debt' page available for those that dislike traditional charity, feel like they don't pay enough in taxes, or simply want to help the country stay No 1 in an eclectic list of superlatives that includes military spending, Olympic gold medals, prison population, corn subsidies, and healthcare costs. But the new-age, Gen Z-friendly method of payment is a recent addition, first flagged on Twitter by Planet Money's Jack Corbett. A bipartisan punching bag that trades sides of the aisle depending on who's in office and who needs funds earmarked for projects in their state, concern over the national debt is one of few issues that Democrats and Republicans can unite on. Also bipartisan is the debt's growth, which has increased every year since 2001, when it sat at $10.28tn. As of this writing, the debt has ballooned to $36.72tn. America is on track to continue the trend, with the Congressional Budget Office estimating that Trump's Big Beautiful Bill will add $3.4tn to the debt over the coming decade. It is unclear how much money Trump and Elon Musk's 'Doge' saved, although analysis estimates the number at under the advertised $180bn, and a far cry short of the initially advertised $2tn. The federal government spent $6.75tn in Fiscal Year 2024 while collecting $4.92tn in revenue. Highlights of past and present government spending include the $151bn procurement process for the Trump administration's Golden Dome missile defense project, over $2tn on Lockheed Martin's long delayed F-35 fighter jet, and roughly $800bn in annual spending on the Pentagon, which recently failed its seventh audit in a row. Kind-hearted Americans have gone above and beyond their regular tax-paying duties contributing around $67.3m since 1996. That's enough to fund 20 minutes of the US government's spending habit. If Americans could dig into their couch cushions, eat less takeout, and tighten their belts, they might be able to tackle the problem once and for all. It would only take about $107,000 per person, payable via ACH, Paypal, credit or debit card, and now, Venmo.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store