logo
Humans beat AI gold-level score at top maths contest

Humans beat AI gold-level score at top maths contest

Google's Gemini chatbot solved five out of the six maths problems set at the IMO. (EPA Images pic)
SYDNEY : Humans beat generative AI models made by Google and OpenAI at a top international mathematics competition, despite the programmes reaching gold-level scores for the first time.
Neither model scored full marks – unlike five young people at the International Mathematical Olympiad (IMO), a prestigious annual competition where participants must be under 20 years old.
Google said yesterday that an advanced version of its Gemini chatbot had solved five out of the six maths problems set at the IMO, held in Australia's Queensland this month.
'We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points – a gold medal score,' the US tech giant cited IMO president Gregor Dolinar as saying.
'Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow.'
Around 10% of human contestants won gold-level medals, and five received perfect scores of 42 points.
US ChatGPT maker OpenAI said that its experimental reasoning model had scored a gold-level 35 points on the test.
The result 'achieved a longstanding grand challenge in AI' at 'the world's most prestigious math competition', OpenAI researcher Alexander Wei wrote on social media.
'We evaluated our models on the 2025 IMO problems under the same rules as human contestants,' he said.
'For each problem, three former IMO medallists independently graded the model's submitted proof.'
Google achieved a silver-medal score at last year's IMO in the British city of Bath, solving four of the six problems.
That took two to three days of computation – far longer than this year, when its Gemini model solved the problems within the 4.5-hour time limit, it said.
The IMO said tech companies had 'privately tested closed-source AI models on this year's problems', the same ones faced by 641 competing students from 112 countries.
'It is very exciting to see progress in the mathematical capabilities of AI models,' said IMO president Dolinar.
Contest organisers could not verify how much computing power had been used by the AI models or whether there had been human involvement, he cautioned.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Google AI system wins gold medal in International Mathematical Olympiad
Google AI system wins gold medal in International Mathematical Olympiad

The Star

time8 hours ago

  • The Star

Google AI system wins gold medal in International Mathematical Olympiad

SAN FRANCISCO: An artificial intelligence system built by Google DeepMind, the tech giant's primary AI lab, has achieved 'gold medal' status in the annual International Mathematical Olympiad, a premier math competition for high school students. It was the first time that a machine – which solved five of the six problems at the 2025 competition, held in Australia this month – reached that level of success, Google said in a blog post Monday. The news is another sign that leading companies are continuing to improve their AI systems in areas such as math, science and computer coding. This kind of technology could accelerate the research of mathematicians and scientists and streamline the work of experienced computer programmers. Two days before Google revealed its feat, an OpenAI researcher said in a social media post that the startup had built technology that achieved a similar score on this year's questions, although it did not officially enter the competition. Both systems were chatbots that received and responded to the questions much like humans. Other AI systems have participated in the International Mathematical Olympiad, or IMO, but they could answer questions only after human experts translated them into a computer programming language built for solving math problems. 'We solved these problems fully in natural language,' Thang Luong, a senior staff research scientist at Google DeepMind, said in an interview. 'That means there was no human intervention – at all.' After OpenAI started the AI boom with the release of ChatGPT in late 2022, the leading chatbots could answer questions, write poetry, summarise news articles, even write a little computer code. But they often struggled with math. Over the past two years, companies such as Google and OpenAI have built AI systems better suited to mathematics, including complex problems that the average person cannot solve. Last year, Google DeepMind unveiled two systems that were designed for math: AlphaGeometry and AlphaProof. Competing in the IMO, these systems achieved 'silver medal' performance, solving four of the competition's six problems. It was the first time a machine reached silver-medal status. Other companies, including a startup called Harmonic, have built similar systems. But systems such as AlphaProof and Harmonic are not chatbots. They can answer questions only after mathematicians translate the questions into Lean, a computer programming language designed for solving math problems. This year, Google entered the IMO with a chatbot that could read and respond to questions in English. This system is not yet available to the public. Called Gemini Deep Think, the technology is what scientists call a 'reasoning' system. This kind of system is designed to reason through tasks involving math, science and computer programming. Unlike previous chatbots, this technology can spend time thinking through complex problems before settling on an answer. Other companies, including OpenAI, Anthropic and China's DeepSeek, offer similar technologies. Like other chatbots, a reasoning system initially learns its skills by analysing enormous amounts of text culled from across the internet. Then it learns additional behaviour through extensive trial and error in a process called reinforcement learning. A reasoning system can be expensive, because it spends additional time thinking about a response. Google said Deep Think had spent the same amount of time with the IMO as human participants did: 4 1/2 hours. But the company declined to say how much money, processing power or electricity had been used to complete the test. In December, an OpenAI system surpassed human performance on a closely watched reasoning test called ARC-AGI. But the company ran afoul of competition rules because it spent nearly US$1.5mil (RM6.3mil) in electricity and computing costs to complete the test, according to pricing estimates. – ©2025 The New York Times Company This article originally appeared in The New York Times.

Google's advertising revenue grows as AI reshapes search results
Google's advertising revenue grows as AI reshapes search results

The Star

time10 hours ago

  • The Star

Google's advertising revenue grows as AI reshapes search results

SAN JOSE: Google's online advertising business, the main revenue driver of its parent company Alphabet, continued its growth as the company further integrates artificial intelligence (AI) into its search engine. In the second quarter, advertising revenues rose 10.4% year-on-year to $71.3 billion, the company reported on Wednesday, slightly exceeding analysts' expectations. Google and Alphabet chief executive Sundar Pichai said AI was "positively impacting every part of the business," highlighting new AI-powered features like AI Overviews and AI Mode that are enhancing user engagement. Google's advertising business is under close scrutiny as the company increasingly integrates AI-generated summaries into its search engine to directly answer user queries. This could reduce the incentive for users to click on links next to the search results, which is how Google generates revenue. Overall, Alphabet's total revenue increased 14% to $96.4 billion, beating market forecasts. Net income rose more than 19% to $28.2 billion. Alphabet's shares fell about 1.5% in after-hours trading following the results. – dpa

AI helps Latin scholars decipher ancient Roman texts
AI helps Latin scholars decipher ancient Roman texts

The Star

time14 hours ago

  • The Star

AI helps Latin scholars decipher ancient Roman texts

A a new artificial intelligence tool, partly developed by Google researchers, can now help Latin scholars piece together these puzzles from the past, according to a study published on July 23. — Pixabay PARIS: Around 1,500 Latin inscriptions are discovered every year, offering an invaluable view into the daily life of ancient Romans – and posing a daunting challenge for the historians tasked with interpreting them. But a new artificial intelligence tool, partly developed by Google researchers, can now help Latin scholars piece together these puzzles from the past, according to a study published on July 23. Inscriptions in Latin were commonplace across the Roman world, from laying out the decrees of emperors to graffiti on the city streets. One mosaic outside a home in the ancient city of Pompeii even warns: "Beware of the dog". These inscriptions are "so precious to historians because they offer first-hand evidence of ancient thought, language, society and history", said study co-author Yannis Assael, a researcher at Google's AI lab DeepMind. "What makes them unique is that they are written by the ancient people themselves across all social classes on any subject. It's not just history written by the elite," Assael, who co-designed the AI model, told a press conference. However these texts have often been damaged over the millennia. "We usually don't know where and when they were written," Assael said. So the researchers created a generative neural network, which is an AI tool that can be trained to identify complex relationships between types of data. They named their model Aeneas, after the Trojan hero and son of the Greek goddess Aphrodite. It was trained on data about the dates, locations and meanings of Latin transcriptions from an empire that spanned five million square kilometres over two millennia. Thea Sommerschield, an epigrapher at the University of Nottingham who co-designed the AI model, said that "studying history through inscriptions is like solving a gigantic jigsaw puzzle". "You can't solve the puzzle with a single isolated piece, even though you know information like its colour or its shape," she explained. "To solve the puzzle, you need to use that information to find the pieces that connect to it." Tested on Augustus This can be a huge job. Latin scholars have to compare inscriptions against "potentially hundreds of parallels", a task which "demands extraordinary erudition" and "laborious manual searches" through massive library and museum collections, the study in the journal Nature said. The researchers trained their model on 176,861 inscriptions – worth up to 16 million characters – five percent of which contained images. It can now estimate the location of an inscription among the 62 Roman provinces, offer a decade when it was produced and even guess what missing sections might have contained, they said. To test their model, the team asked Aeneas to analyse a famous inscription called "Res Gestae Divi Augusti", in which Rome's first emperor Augustus detailed his accomplishments. Debate still rages between historians about when exactly the text was written. Though the text is riddled with exaggerations, irrelevant dates and erroneous geographical references, the researchers said that Aeneas was able to use subtle clues such as archaic spelling to land on two possible dates – the two being debated between historians. More than 20 historians who tried out the model found it provided a useful starting point in 90 percent of cases, according to DeepMind. The best results came when historians used the AI model together with their skills as researchers, rather than relying solely on one or the other, the study said. "Since their breakthrough, generative neural networks have seemed at odds with educational goals, with fears that relying on AI hinders critical thinking rather than enhances knowledge," said study co-author Robbe Wulgaert, a Belgian AI researcher. "By developing Aeneas, we demonstrate how this technology can meaningfully support the humanities by addressing concrete challenges historians face." – AFP

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store