The high-schoolers who just beat the world's smartest AI models

a day ago

The smartest AI models ever made just went to the most prestigious competition for young mathematicians and managed to achieve the kind of breakthrough that once seemed miraculous.
They still got beat by the world's brightest teenagers.
Every year, a few hundred elite high-school students from all over the planet gather at the International Mathematical Olympiad. This year, those brilliant minds were joined by Google DeepMind and other companies in the business of artificial intelligence. They had all come for one of the ultimate tests of reasoning, logic and creativity.
The famously grueling IMO exam is held over two days and gives students three increasingly difficult problems a day and more than four hours to solve them. The questions span algebra, geometry, number theory and combinatorics—and you can forget about answering them if you're not a math whiz. You'll give your brain a workout just trying to understand them.
Because those problems are both complex and unconventional, the annual math test has become a useful benchmark for measuring AI progress from one year to the next. In this age of rapid development, the leading research labs dreamed of a day their systems would be powerful enough to meet the standard for an IMO gold medal, which became the AI equivalent of a four-minute mile.
But nobody knew when they would reach that milestone or if they ever would—until now.
This year's International Mathematical Olympiad attracted high-school students from all over the world.
The unthinkable occurred earlier this month when an AI model from Google DeepMind earned a gold-medal score at IMO by perfectly solving five of the six problems. In another dramatic twist, OpenAI also claimed gold despite not participating in the official event. The companies described their feats as giant leaps toward the future—even if they're not quite there yet.
In fact, the most remarkable part of this memorable event is that 26 students got higher scores on the IMO exam than the AI systems.
Among them were four stars of the U.S. team, including Qiao (Tiger) Zhang, a two-time gold medalist from California, and Alexander Wang, who brought his third straight gold back to New Jersey. That makes him one of the most decorated young mathematicians of all time—and he's a high-school senior who can go for another gold at IMO next year.
But in a year, he might be dealing with a different equation altogether.
'I think it's really likely that AI is going to be able to get a perfect score next year," Wang said.
'That would be insane progress," Zhang said. 'I'm 50-50 on it."
So given those odds, will this be remembered as the last IMO when humans outperformed AI?
'It might well be," said Thang Luong, the leader of Google DeepMind's team.
Until very recently, what happened in Australia would have sounded about as likely as koalas doing calculus.
But the inconceivable began to feel almost inevitable last year, when DeepMind's models built for math solved four problems and racked up 28 points for a silver medal, just one point short of gold. This year, the IMO officially invited a select group of tech companies to their own competition, giving them the same problems as the students and having coordinators grade their solutions with the same rubric.
They were eager for the challenge. AI models are trained on unfathomable amounts of information—so if anything has been done before, the chances are they can figure out how to do it again. But they can struggle with problems they have never seen before.
As it happens, the IMO process is specifically designed to come up with those original and unconventional problems.
In addition to being novel, the problems also have to be interesting and beautiful, said IMO president Gregor Dolinar. If a problem under consideration is similar to 'any other problem published anywhere in the world," he said, it gets tossed. By the time students take the exam, the list of a few hundred suggested problems has been whittled down to six.
Meanwhile, the DeepMind team kept improving the AI system it would bring to IMO, an unreleased version of Google's advanced reasoning model Gemini Deep Think, and it was still making tweaks in the days leading up to the competition.
The effort was led by Thang Luong, a senior staff research scientist who narrowly missed getting to IMO in high school with Vietnam's team. He finally made it to IMO last year—with Google. Before he returned this year, DeepMind executives asked about the possibility of gold.
He told them to expect bronze or silver again.
He adjusted his expectations when DeepMind's model nailed all three problems on the first day. The simplicity, elegance and sheer readability of those solutions astonished mathematicians. The next day, as soon as Luong and his colleagues realized their AI creation had crushed two more proofs, they also realized that would be enough for gold.
They celebrated their monumental accomplishment by doing one thing the other medalists couldn't: They cracked open a bottle of whiskey.
Key members of Google DeepMind's gold-medal-winning team, including Thang Luong, second from left.
To keep the focus on students, the companies at IMO agreed not to release their results until later this month. But as soon as the Olympiad's closing ceremony ended, one company declared that its AI model had struck gold—and it wasn't DeepMind.
It was OpenAI.
The company wasn't a part of the IMO event, but OpenAI gave its latest experimental reasoning model all six problems and enlisted former medalists to grade the proofs. Like DeepMind's, OpenAI's system flawlessly solved five and scored 35 out of 42 points to meet the gold standard.
After the OpenAI victory lap on social media, the embargo was lifted and DeepMind told the world about its own triumph—and that its performance was certified by the IMO.
Not long ago, it was hard to imagine AI rivals dueling for glory like this.
In 2021, a Ph.D. student named Alexander Wei was part of a study that asked him to predict the state of AI math by July 2025—that is, right now. When he looked at the other forecasts, he thought they were much too optimistic. As it turned out, they weren't nearly optimistic enough. Now he's living proof of just how wrong he was: Wei is the research scientist who led the IMO project for OpenAI.
The only thing more impressive than what the AI systems did was how they did it.
Google called its result a major advance, though not because DeepMind won gold instead of silver. Last year, the model needed the problems to be translated into a computer programming language for math proofs. This year, it operated entirely in 'natural language" without any human intervention. DeepMind also crushed the exam within the IMO time limit of 4 ½ hours after taking several days of computation just a year ago.
You might find all of this completely terrifying—and think of AI as competition. The humans behind the models see them as complementary.
'This could perhaps be a new calculator," Luong said, 'that powers the next generation of mathematicians."
Speaking of that next generation, the IMO gold medalists have already been overshadowed by AI.
So let's put them back in the spotlight.
Team USA at the International Mathematical Olympiad, including Alexander Wang, fourth from right, and Tiger Zhang, with the stuffed red panda on his head.
Qiao Zhang is a 17-year-old student in Los Angeles on his way to MIT to study math and computer science. As a young boy, his family moved to the U.S. from China and his parents gave him a choice of two American names. He picked Tiger over Elephant.
His career in competitive math began in second grade, when he entered a contest called the Math Kangaroo. It ended this month at the math Olympics next to a hotel in Australia with actual kangaroos.
When he sat down at his desk with a pen and lots of scratch paper, Zhang spent the longest amount of time during the exam on Problem 6. It was a problem in the notoriously tricky field of combinatorics, the branch of mathematics that deals with counting, arranging and combining discrete objects, and it was easily the hardest on this year's test. The solution required the ingenuity, creativity and intuition that humans can muster but machines cannot—at least not yet.
'I would actually be a bit scared if the AI models could do stuff on Problem 6," he said.
Problem 6 did stump DeepMind and OpenAI's models, but it wasn't just problematic for AI. Of the 630 student contestants, 569 also received zero points. Only six received the full credit of seven points. Zhang was proud of his partial solution that earned four points—which was four more than almost everyone else.
At this year's IMO, 72 contestants went home with gold. But for some, a medal wasn't their only prize. Zhang was among those who left with another keepsake: victory over the AI models.
(As if it weren't enough that he can bend numbers to his will, he also has a way with words and wrote this about his IMO experience.)
In the end, the six members of the U.S. team piled up five golds and one silver, finishing second overall behind the Chinese after knocking them off the top spot last year.
There was once a time when such precocious math students grew up to become professors. (Or presidents—the recently elected president of Romania was a two-time IMO gold medalist with perfect scores.) While many still choose academia, others get recruited by algorithmic trading firms and hedge funds, where their quantitative brains have never been so highly valued. This year, the U.S. team was supported by Jane Street while XTX Markets sponsored the whole event. After all, they will soon be competing with each other—and with the richest tech companies—for their intellectual talents.
By then, AI might be destroying mere humans at math. But not if you ask Junehyuk Jung.
A former IMO gold medalist himself, Jung is now an associate professor at Brown University and visiting researcher at DeepMind who worked on its gold-medal model. He doesn't believe this was humanity's last stand, though. He thinks problems like Problem 6 will flummox AI for at least another decade.
And he walked away from perhaps the most significant math contest in history feeling bullish on all kinds of intelligence.
'There are things AI will do very well," he said. 'There are still going to be things that humans can do better."
Write to Ben Cohen at ben.cohen@wsj.com

Hashtags

Science

#IMO

#InternationalMathematicalOlympiad

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Justice Rajesh Bindal flags use of AI search models, leading to 'fake judgements'

India Today

26 minutes ago

India Today

Justice Rajesh Bindal flags use of AI search models, leading to 'fake judgements'

"In India and in the USA, the use of AI search models by young lawyers has led to 'fake judgments,' being placed before courts," said Justice Rajesh Bindal on at an event organised by the All India Senior Lawyers Association, Justice Rajesh Bindal, a sitting Judge of the Supreme Court (SC) of India, said that Senior lawyers have a "responsibility to mentor" young young lawyers against over reliance on technology, Justice Bindal gave examples of incorrect information being cited by AI (Artificial Intelligence) search models. "Sometimes lawyers search on AI using one or two keywords and they cite judgments... it may be incorrect, it may have been the minority they don't know have been instances of the AI created it's own fake judgments and opinions which get presented before Court," said Justice Bindal."This is the danger mark of AI that it generates fake judgments and information. Senior lawyers need to groom the Young Bar about these dangers," said the SC judge."It was said we judges do a lot of work, but the core behind it is the research done by the young lawyers and the arguments made by the Senior counsels," said Bindal was speaking at the felicitation ceremony organised by the All India Senior Advocates Association for the four new Judges elevated to the Supreme Court in the last month- Justice Joymalya Bagchi, Justice Nilay V Anjaria, Justice Vijay Boshnoi and Justice Atul S Chandurkar. Justices Bindal and PB Varale were the "senior" members of the Bench attending the Advocate and MP P Wilson, Senior Advocate Adish Aggarwala and Senior advocate Pitambari Acharya shared the dais with the at the event, Senior Advocate and Parliamentarian P Wilson told the gathering that he had introduced a Private Members Bill to increase the retirement age of Judges before the Parliament.- EndsTune InMust Watch

Cheyenne to host massive AI data center using more electricity than all Wyoming homes combined

Mint

an hour ago

Mint

Cheyenne to host massive AI data center using more electricity than all Wyoming homes combined

CHEYENNE, Wyo. (AP) — An artificial intelligence data center that would use more electricity than every home in Wyoming combined before expanding to as much as five times that size will be built soon near Cheyenne, according to the city's mayor. 'It's a game changer. It's huge,' Mayor Patrick Collins said Monday. With cool weather — good for keeping computer temperatures down — and an abundance of inexpensive electricity from a top energy-producing state, Wyoming's capital has become a hub of computing power. The city has been home to Microsoft data centers since 2012. An $800 million data center announced last year by Facebook parent company Meta Platforms is nearing completion, Collins said. The latest data center, a joint effort between regional energy infrastructure company Tallgrass and AI data center developer Crusoe, would begin at 1.8 gigawatts of electricity and be scalable to 10 gigawatts, according to a joint company statement. A gigawatt can power as many as 1 million homes. But that's more homes than Wyoming has people. The least populated state, Wyoming, has about 590,000 people. And it's a major exporter of energy. A top producer of coal, oil and gas, Wyoming ranks behind only Texas, New Mexico and Pennsylvania as a top net energy-producing state, according to the U.S. Energy Information Administration. Accounting for fossil fuels, Wyoming produces about 12 times more energy than it consumes. The state exports almost three-fifths of the electricity it produces, according to the EIA. But this proposed data center is so big, it would have its own dedicated energy from gas generation and renewable sources, according to Collins and company officials. Gov. Mark Gordon praised the project's value to the state's gas industry. 'This is exciting news for Wyoming and for Wyoming natural gas producers," Gordon said in the statement. While data centers are energy-hungry, experts say companies can help reduce their effect on the climate by powering them with renewable energy rather than fossil fuels. Even so, electricity customers might see their bills increase as utilities plan for massive data projects on the grid. The data center would be built several miles (kilometers) south of Cheyenne off U.S. 85 near the Colorado state line. State and local regulators would need to sign off on the project, but Collins was optimistic construction could begin soon. "I believe their plans are to go sooner rather than later,' Collins said. OpenAI, the developer of Chat GPT, has been scouring the U.S. for sites for a massive AI data center effort called Stargate, but a Crusoe spokesperson declined to say if the Cheyenne project was one. 'We are not at a stage that we are ready to announce our tenant there,' said the spokesperson, Andrew Schmitt. 'I can't confirm or deny that is going to be one of the stargate." Recently, OpenAI announced it had switched on the first phase of a Crusoe-built data center complex in Abilene, Texas, in a partnership with software giant Oracle. 'To the best of our knowledge, it is the largest data center — we think of it as a campus — in the world,' OpenAI's chief global affairs officer Chris Lehane told The Associated Press last week. 'It generates, roughly and depending how you count, about a gigawatt of energy.' OpenAI has also been looking elsewhere in the U.S. to expand its data centers. It said last week that it has entered into an agreement with Oracle to develop another 4.5 gigawatts of data center capacity. 'We're now in a position where we have, in a really concrete way, identified over five gigawatts of energy that we're going to be able to build around,' Lehane said. OpenAI hasn't named any locations, besides its flagship site in Texas, where it plans to build data centers. As of earlier this year, Wyoming was not one of the 16 states where OpenAI said it was looking for locations to build new data centers. O'Brien reported from Austin, Texas.

Chinese AI firms unite to build ecosystem amid US curbs

Deccan Herald

5 hours ago

Deccan Herald

Chinese AI firms unite to build ecosystem amid US curbs

China's artificial intelligence companies have announced two new industry alliances, aiming to develop a domestic ecosystem to reduce dependence on foreign tech as they seek to cope with U.S. export restrictions on advanced Nvidia chipsets. The conference showcased a slew of new products, such as an AI computing system from Huawei that experts believe rivals Nvidia's most advanced offering, as well as consumer-friendly products such as several kinds of digital AI glasses. The "Model-Chip Ecosystem Innovation Alliance" brings together Chinese developers of large language models (LLMs) and AI chip manufacturers. "This is an innovative ecosystem that connects the complete technology chain from chips to models to infrastructure," said Zhao Lidong, CEO of Enflame, one of the participating chipmakers. Other manufacturers of graphics processing units (GPUs) in the alliance include Huawei, Biren, and Moore Threads, which have been hit by U.S. sanctions that block them from purchasing advanced tech made with U.S. know-how. The alliance was announced by StepFun, an LLM developer. A second alliance, the Shanghai General Chamber of Commerce AI Committee, aims to "promote the deep integration of AI technology and industrial transformation." Participants include SenseTime, also sanctioned by the U.S. and which has pivoted from facial recognition technology to LLMs. Others are StepFun and another LLM developer, MiniMax, as well as chipmakers Metax and Iluvatar CoreX. Huawei's system design capabilities have meant that it has been able to use more chips and system-level innovations to compensate for weaker individual chip performance, SemiAnalysis said. At least six other Chinese computing firms showcased similar "clustering" chip technology. Metax demonstrated an AI supernode featuring 128 C550 chips designed to support large-scale liquid-cooled data centre requirements. Other events included Tencent's unveiling of its open-source Hunyuan3D World Model 1.0, which the company said enables users to generate interactive 3D environments through text or image prompts. Baidu announced what it said was next-generation "digital human" technology that helps businesses to create virtual livestreamers. It features "cloning technology" that can replicate a human's voice, tone, and body language from just 10 minutes of sample footage. Alibaba was among those announcing AI glasses. Its Quark AI Glasses are powered by its Qwen AI model and are due to be released in China by the end of 2025. They will allow users to access the tech giant's map service for easy navigating and to use Alipay by scanning QR codes with voice commands.