Latest news with #TerenceTao


Time of India
2 days ago
- Science
- Time of India
Explained: Why this mathematician thinks OpenAI isn't acing the International Mathematical Olympiad — and might be ‘cheating' to win gold
TL;DR AI isn't taking the real test: GPT models 'solving' International Math Olympiad (IMO) problems are often operating under very different conditions—rewrites, retries, human edits. Tao's warning: Fields Medalist Terence Tao says comparing these AI outputs to real IMO scores is misleading because the rules are entirely different. Behind the curtain: Teams often cherry-pick successes, rewrite problems, and discard failures before showing the best output. It's not cheating, but it's not fair play: The AI isn't sitting in silence under timed pressure—it's basically Iron Man in a school exam hall. Main takeaway: Don't mistake polished AI outputs under ideal lab conditions for human-level reasoning under Olympiad pressure. Led Zeppelin once sang, 'There's a lady who's sure all that glitters is gold.' But in the age of artificial intelligence, even the shimmer of mathematical brilliance needs closer scrutiny. These days, social media lights up every time a language model like GPT-4 is said to have solved a problem from the International Mathematical Olympiad (IMO) — a competition so elite it makes Ivy League entrance exams look like warm-up puzzles. 'This AI solved an IMO question!' 'Superintelligence is here!' 'We're witnessing the birth of a digital Newton!' Or so the chorus goes. But one of the greatest living mathematicians isn't singing along. Terence Tao, a Fields Medal–winning professor at UCLA, has waded into the hype with a calm, clinical reminder: AI models aren't playing by the same rules. And if the rules aren't the same, the gold medal doesn't mean the same thing. The Setup: What the IMO Actually Demands The International Mathematical Olympiad is the Olympics of high school math. Students from around the world train for years to face six unspeakably hard problems over two days. They get 4.5 hours per day, no calculators, no internet, no collaboration — just a pen, a problem, and their own mind. Solving even one problem in full is an achievement. Getting five perfect scores earns you gold. Solve all six and you enter the realm of myth — which, incidentally, is where Tao himself resides. He won a gold medal in the IMO at age 13. So when an AI is said to 'solve' an IMO question, it's important to ask: under what conditions? Enter Tao: The IMO, Rewritten (Literally) In a detailed Mastodon post, Tao explains that many AI demonstrations that showcase Olympiad-level problem solving do so under dramatically altered conditions. He outlines a scenario that mirrors what's actually happening behind the scenes: 'The team leader… gives them days instead of hours to solve a question, lets them rewrite the question in a more convenient formulation, allows calculators and internet searches, gives hints, lets all six team members work together, and then only submits the best of the six solutions… quietly withdrawing from problems that none of the team members manage to solve.' In other words: cherry-picking, rewording, retries, collaboration, and silence around failure. It's not quite cheating — but it's not the IMO either. It's an AI-friendly reconstruction of the Olympiad, where the scoreboard is controlled by the people training the system. From Bronze to Gold (If You Rewrite the Test) Tao's criticism isn't just about fairness — it's about what we're really evaluating. He writes, 'A student who might not even earn a bronze medal under the standard IMO rules could earn a 'gold medal' under these alternate rules, not because their intrinsic ability has improved, but because the rules have changed.' This is the crux. AI isn't solving problems like a student. It's performing in a lab, with handlers, retries, and tools. What looks like genius is often a heavily scaffolded pipeline of failed attempts, reruns, and prompt rewrites. The only thing the public sees is the polished output. Tao doesn't deny that AI has made remarkable progress. But he warns against blurring the lines between performance under ideal conditions and human-level problem-solving in strict, unforgiving settings. Apples to Oranges — and Cyborg Oranges Tao is careful not to throw cold water on AI research. But he urges a reality check. 'One should be wary of making apples-to-apples comparisons between the performance of various AI models (or between such models and the human contestants) unless one is confident that they were subject to the same set of rules.' A tweet that says 'GPT-4 solved this problem' often omits what really happened: – Was the prompt rewritten ten times? – Did the model try and fail repeatedly? – Were the failures silently discarded? – Was the answer chosen and edited by a human? Compare that to a teenager in an exam hall, sweating out one solution in 4.5 hours with no safety net. The playing field isn't level — it's two entirely different games. The Bottom Line Terence Tao doesn't claim that AI is incapable of mathematical insight. What he insists on is clarity of conditions. If AI wants to claim a gold medal, it should sit the same exam, with the same constraints, and the same risks of failure. Right now, it's as if Iron Man entered a sprint race, flew across the finish line, and people started asking if he's the next Usain Bolt . The AI didn't cheat. But someone forgot to mention it wasn't really racing. And so we return to that Led Zeppelin lyric: 'There's a lady who's sure all that glitters is gold.' In 2025, that lady might be your algorithmic feed. And that gold? It's probably just polished scaffolding. FAQ: AI, the IMO, and Terence Tao's Critique Q1: What is the International Mathematical Olympiad (IMO)? It's the world's toughest math competition for high schoolers, with six extremely challenging problems solved over two 4.5-hour sessions—no internet, no calculators, no teamwork. Q2: What's the controversy with AI and IMO questions? AI models like GPT-4 are shown to 'solve' IMO problems, but they do so with major help: problem rewrites, unlimited retries, internet access, collaboration, and selective publishing of only successful attempts. Q3: Who raised concerns about this? Terence Tao, one of the greatest mathematicians alive and an IMO gold medalist himself, called out this discrepancy in a Mastodon post. Q4: Is this AI cheating? Not exactly. But Tao argues that changing the rules makes it a different contest altogether—comparing lab-optimised AI to real students is unfair and misleading. Q5: What's Tao's main point? He urges clarity. If we're going to say AI 'solved' a problem, we must also disclose the conditions—otherwise, it's like comparing a cyborg sprinter to a high school track star and pretending they're equals. Q6: Does Tao oppose AI? No. He recognises AI's impressive progress in math, but wants honesty about what it means—and doesn't mean—for genuine problem-solving ability. Q7: What should change? If AI is to be judged against human benchmarks like the IMO, it must be subjected to the same constraints: time limits, no edits, no retries, no external tools. Tao's verdict? If you want to claim gold, don't fly across the finish line in an Iron Man suit and pretend you ran. AI Masterclass for Students. Upskill Young Ones Today!– Join Now


India Today
3 days ago
- Science
- India Today
OpenAI won gold at the world's toughest math exam. Why the Olympiad gold matters
In a jaw-dropping achievement for the world of artificial intelligence, OpenAI's latest experimental model has scored at the gold medal level at the International Mathematical Olympiad (IMO) -- one of the toughest math exams on the is the same event held on the Sunshine Coast in Australia where India won six medals this year and ranked 7th amongst 110 participating HITS GOLD IN THE WORLD'S TOUGHEST MATH TESTThe IMO is no ordinary competition. Since its launch in 1959 in Romania, it has become the gold standard for testing mathematical genius among high school students globally. Over two intense days, participants face a gruelling four-and-a-half-hour paper with only three questions each day. These are not your average exam questions -- they demand deep logic, creativity and problem-solving that, OpenAI's model solved five out of six questions correctly -- under the same testing conditions as human DOUBTED AI COULD DO THIS -- UNTIL NOWEven renowned mathematician Terence Tao -- an IMO gold medallist himself -- had doubts. In a podcast in June, he suggested that AI wasn't yet ready for the IMO level and should try simpler math contests first. But OpenAI has now proven otherwise."Also this model thinks for a *long* time. o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly, it's also more efficient with its thinking," Noam Brown from OpenAI wrote on LinkedIn."It's worth reflecting on just how fast AI progress has been, especially in math. In 2024, AI labs were using grade school math (GSM8K) as an eval in their model releases. Since then, we've saturated the (high school) MATH benchmark, then AIME, and now are at IMO gold," he THIS IS A BIG DEAL FOR GENERAL AIThis isn't just about math. OpenAI says this shows their AI model is breaking new ground in general-purpose reasoning. Unlike Google DeepMind's AlphaGeometry -- built just for geometry -- OpenAI's model is a general large language model that happens to be great at math too."Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn't an IMO-specific model. It's a reasoning LLM that incorporates new experimental general-purpose techniques," Brown explained in his Sam Altman called it 'a dream' when OpenAI began. 'This is a marker of how far AI has come in a decade.'advertisementBut before you get your hopes up, this high-performing AI isn't going public just yet. Altman confirmed it'll be 'many months' before this gold-level model is STILL REMAINNot everyone is fully convinced. AI expert Gary Marcus called the model's results 'genuinely impressive' -- but raised fair questions about training methods, how useful this is for the average person, and how much it all the win marks a huge leap in what artificial intelligence can do -- and how fast it's improving.- EndsMust Watch


South China Morning Post
25-06-2025
- Science
- South China Morning Post
Star mathematician Joshua Zahl leaves Canada for China after solving century-old puzzle
China has secured a major academic coup with the recruitment of mathematics luminary Joshua Zahl, recently celebrated for solving the more than 100-year-old three-dimensional Kakeya conjecture. Zahl is leaving Canada's University of British Columbia (UBC) to take up a full-time position as a chair professor at Nankai University's Chern Institute of Mathematics (CIM), according to the Chinese educational institution's website. Zahl and his collaborator Wang Hong from New York University posted their milestone proof in a 127-page preprint paper on the open-access repository arXiv in February, and the feat was immediately hailed by the prominent UCLA mathematician Terence Tao. Writing on his blog a day after the paper appeared, Tao described the achievement as 'some spectacular progress in geometric measure theory', confirming that Zahl and Wang had resolved 'the three-dimensional case of the infamous Kakeya set conjecture'. Tao, who is also Zahl's doctoral mentor, has long been focused on the Kakeya problem. He published his ideas on the conjecture in 2014 on his blog, providing a foundation for Zahl and Wang's work. 'It's like perfecting a perpetual-motion machine. It's magical; they are getting more out of the output than they put in. Their approach proves the three-dimensional Kakeya conjecture,' Tao wrote.


The Star
04-05-2025
- Science
- The Star
Contradictheory: The false flags of AI
How hard is it to draw the Malaysian flag? Easy enough to ask a computer to do it for you, but hard enough that it'll probably get the stripes, the star, and the moon on the design wrong. I'm referring, of course, to not one but two recent débâcles: First, a national newspaper ran a front-page image of the Malaysian flag that was missing the crescent moon. Then the Education Ministry distributed an SPM examination analysis report with a flag that had too many stars and too few stripes. Now before I get into it much further, let's admit that all of this could have been avoided if the humans in charge had paid a little more attention. But perhaps we are beginning to trust artificial intelligence (AI) just a little too much. It is computing, but not as we know it. Renowned Australian-American mathematician Terence Tao said in a lecture about the future role of AI in science and mathematics that AI is fundamentally a 'guessing machine'. We're used to computers giving the right answer, every single time. But AI doesn't do precision. It doesn't always get it right. It doesn't even always give you the same answer, just something that vaguely resembles what it's seen before. For the AI machine, the Malaysian flag isn't a precise star and crescent adorned with 14 red and white stripes. It's a yellow blob-ish star thing on a blue background, with some colourful lines thrown in somewhere. This 'best guess' strategy makes AI wonderfully flexible for tasks we used to think computers couldn't handle, like generate a photo of something vaguely described, but also dumb at some things humans find easy. But here's my suggestion: Instead of getting more humans to double-check AI's clever outputs, maybe we should just use more computers – specifically, old-school computers that just do what we ask them to and don't guess at anything. I know what some of you are thinking: Using computers got us into this mess, why would using more get us out of it? To try to explain this, let me step away from art into mathematics. Back in 1976, two mathematicians proved something called the Four Colour Theorem. It basically says that any map can be coloured with just four colours such that no two neighbouring countries share the same one. While it's easy to understand and demonstrate with a box of crayons, it's actually very hard to prove. (This, by the way, is the difference between solving maths problems and proving theorems. Solving problems means getting answers to sums. Proving theorems means constructing airtight arguments that work for any map, anywhere, ever. It's also why a maths degree often involves very few numbers and a lot more phrases like, 'But it's obvious, isn't it?') What made the 1976 proof of the Four Colour Theorem so contentious was that it relied heavily on thousands of hours of computer work that no human could realistically verify. Was a proof valid if no human in the world could check it? Conceivably, they could have asked thousands of other mathematicians to go over various parts of the work done by the computer. But maths traditionally resists large groups of people, if only for the reason that mathematicians don't trust others to do the work properly (or as they say, 'They're not mathematically rigorous enough'). Then, in 2005, another pair of mathematicians used a program called Coq to verify that the original 1976 work was correct. Coq is a proof assistant, which is a computer program that checks the logic of a proof step by step. This may seem counterintuitive. They used a computer to confirm that a computer-assisted proof from 30 years ago was valid? But mathematicians have slowly embraced computer proof assistants over the years. They are built around a small, trustworthy 'kernel', a tiny piece of code that performs the actual logic-checking. If the kernel is verified, then we can trust the results it produces. It's like having an employee who is so reliable that if they say the blueprint is flawless, you believe them. Most of these kernels are just a few hundred to a few thousand lines of code, which is small enough for human experts to inspect thoroughly in a variety of ways. In contrast, modern AI systems use machine learning, which is akin to a mysterious black box that even their creators don't fully understand. Who knows why an AI thinks what a flag is supposed to look like? Now, the hardest part of using a proof assistant is in 'formalising' the original proof. This is the laborious process of translating a human-readable proof into a precise format the computer can understand. Mathematicians love to say 'It's obvious that...', which computers hate. Computers need everything spelled out in excruciating detail, and formalising a proof can take anything from a few weeks to several years, because if you input it wrong, it just doesn't work. The maths don't maths. So Tao suggests that we may soon be able to employ 'beginner' mathematicians who aren't particularly strong at maths – because the proof assistant will vet their input and reject it if it's not correct. And his point is that we can combine this with AI. Let the AI guess how to formalise a proof, and let the proof assistant tell it if it got it wrong. You get the power of creativity with the safety net of rigour. That kind of rigour is exactly what's missing as we clumsily stumble to embrace the use of AI tools in the workplace. We already accept spell-checkers, and those weren't built with AI. So let's build systems to flag potential problems in AI-generated output. For instance, imagine an editor sees a giant blinking red box around a photo marked 'AI-generated', warning that it might not be accurate. Or a block of text that's flagged because it closely matches something else online, highlighting the risk of plagiarism. As usual, it's not the tools that are dangerous or bad, it's how you use them. It's OK to wave the flag and rally users to the wonderful new future that AI brings. But just remember that computers sometimes work better with humans, rather than instead of them. In his fortnightly column, Contradictheory , mathematician-turned-scriptwriter Dzof Azmi explores the theory that logic is the antithesis of emotion but people need both to make sense of life's vagaries and contradictions. Write to Dzof at lifestyle@ The views expressed here are entirely the writer's own.


Indian Express
25-04-2025
- Business
- Indian Express
Opinion Indian higher education institutions need to be prepared for the churn created by Trump's crackdown on US universities
The ongoing confrontation between the Trump administration and virtually all of America's prominent research universities is unprecedented. US universities face challenges on several fronts. Among them is a massive reduction in what universities are allowed to charge as overheads for administering research grants, an important source of funding. The abrupt cancellation of visas of several foreign students on minor grounds, with the threat of more to come, has added further pressure. America's flagship research funding agencies, among them the NSF, NIH and NEH, have slashed their grants, reflecting the Trump administration's new financial priorities. What does this mean for India? Indian students made up the highest number of overall international enrollments in the US universities, at 29.4 per cent in 2024-25. India has maintained its position as the top sender of international graduate students to the US for the second year running. Both public and private higher education institutions in India should now expect applications from students who would otherwise have headed abroad, concerned about being able to complete their degrees (Initial reports suggest a more than 30 per cent decline in applications to US universities). Such institutions should also expect increased interest in transfers, from students worried that a minor misdemeanour might lead to their being asked to self-deport. But further upheavals might also be in store. Faculty members in the USA who retain citizenship of their home countries have become aware of the precarity of their immigration status under the new regime. The declining tolerance for diversity along multiple axes is a concerning development. Decoding irreversible shifts from temporary realignments isn't easy. But would we have enough jobs to be able to accommodate the best of those who might think of returning? This seems unlikely, and not just in India. The leading mathematician and Fields medalist Terence Tao recently said, 'One could argue that any 'brain drain' from the US would simply result in an equal and opposite 'brain gain' in other countries, but … in practice, the rest of the world would not be able to absorb all of the lost opportunities in the US in a single job cycle'. One cannot but be pessimistic about India's ability to turn the current turmoil to its advantage. Many public institutions have relatively small numbers of positions to hire into, if they do so at all. Mechanisms for hiring are archaic, opaque, time-consuming and often politicised. In virtually every university department, faculty members have little to no input about candidates to be hired, with this job being that of an all-powerful external selection committee. The constitution of the selection committee, a prerogative of Vice Chancellors, is often the key to appointing 'desirable' candidates. Private institutions, perhaps, have more flexibility, but working conditions and salaries are variable. The Chinese model of targeting and making attractive offers to high-quality faculty, largely those trained in the US system but with roots in China, is credited with the current high quality of institutions in the country. A 2025 Nature Index methodology ranking physics research showed that China dominated the top 10 list, with only two non-Chinese institutions in that list. However, the difference between Chinese and Indian investment in higher education is staggering. As Ramgopal Rao, the former Director of IIT Delhi, has pointed out, what China spends on just two of its major universities is the entire higher education budget of India. Incentivising faculty members abroad who wish to return by giving them a choice of universities to return to, while their salaries are underwritten by the Centre, is a possibility. The recently announced Vaibhav Fellowships are a first step towards this. But to base our actions on what we might do solely with the idea of attracting foreign academics to return would be meaningless if we cannot also re-imagine our universities and make them more rewarding institutions with attractive intellectual environments. We need more institutions. We also need to make our existing ones larger and better. We need more eyes on India, including its public health, culture, society and biodiversity. This is an opportunity for India to build institutions that can be intellectual leaders for the Global South. We need structural changes in the functioning of all our institutions of higher education, changes that will ensure academic independence as well as the highest standards. Changing how these institutions are assessed is needed, as is more public accountability and transparency in how they function. We should also look beyond STEMM programmes, since the world of the future will require diverse skills. A broad liberal education, provided by universities in the true sense and not purely technical institutions, is key to addressing the 'wicked' problems of the future. These changes are required desperately anyway, and not just to facilitate the return of NRIs. Our challenge is to make our institutions welcome intellectual spaces, not just to those from outside who are seeking to return, but also to those who never left.