logo
At Secret Math Meeting, Researchers Struggle to Outsmart AI

At Secret Math Meeting, Researchers Struggle to Outsmart AI

On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems the had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia, who attended the meeting.
The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs.
To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which they hadn't previously been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D. to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering, undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset.
The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would find the final 10 challenge questions. The meeting was headed by Ono, who split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Any problems the o4-mini couldn't solve would garner the mathematician who came up with them a $7,500 reward.
By the end of that Saturday night, Ono was frustrated with the team's lack of progress. 'I came up with a problem which everyone in my field knows to be an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!''
Defeated, Ono jumped onto Signal that night and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.'
Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.'
The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete.
While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.'
By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations.
'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but these large language models are already outperforming most of our best graduate students in the world.'
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

This viral ChatGPT prompt can teach you anything — and I'm officially hooked
This viral ChatGPT prompt can teach you anything — and I'm officially hooked

Yahoo

timean hour ago

  • Yahoo

This viral ChatGPT prompt can teach you anything — and I'm officially hooked

When you buy through links on our articles, Future and its syndication partners may earn a commission. If you've ever asked ChatGPT to explain something and felt like the answer was too vague, too fast or just not sinking in, you're going to want to try this viral prompt. As a power user, I have tested a thousands of prompts and definitely have my favorites. But now, I have a new one. I used to get overwhelmed trying to learn new topics, but since discovering this now-viral Reddit prompt, all of that has changed. Unlike other prompts that may be designed for productivity or brainstorming, this particular prompt is designed to turn ChatGPT into a customized, interactive tutor. The prompt, originally shared on r/ChatGPT, gives the AI a structured role: to ask questions before answering, tailor explanations to your level and then offer multiple paths of exploration. In other words, instead of dumping information at you, it's more interactive so it builds a learning plan tailored to you. After testing it across topics from neuroscience to personal finance, I can confidently say: it works. You Can Learn Everything With This Prompt. BEST LEARNING PROMPT! from r/ChatGPTPromptGenius The Reddit prompt is dense and might be confusing because it looks a little different than most prompts. But, you're going to want to copy the entire prompt into ChatGPT and hit send. From there, the AI will prompt you with follow up questions. Too bulky? I've streamlined a version of it for you: "Act as a private tutor. Ask me what I already know about the topic, how deep I want to go, and how much time I want to spend learning. Then create a personalized explanation plan and quiz me along the way to check understanding. Don't move on until I confirm I'm ready." Immediately, ChatGPT shifts from reactive assistant to proactive guide. It starts by asking smart, clarifying questions, then delivers layered responses that build on each other. Whether I wanted a summary or a deep dive, it adjusted. It even offered practice questions and examples tailored to my interests. "Explain this topic to me like I'm 12 — then like I'm a college student." "Give me a beginner, intermediate, and advanced breakdown of [topic]." "Quiz me on what I've learned so far. Use multiple choice." "What are common misunderstandings about this subject?" "How can I apply this knowledge in real life?" I tried the viral prompt to take a deep dive into the history of 1960s rock n' roll and learned stuff my parents didn't even know. The add-on prompts helped me deepen my retention, fill in gaps and stay engaged. I have used them for everything from world history to animal facts. There is realy no limit to how helpful this prompt can be for continued education. What makes this prompt so effective is that it aligns with the way people learn best: through interaction, scaffolding and feedback. When ChatGPT asks what you already know, it avoids wasting time on the basics or skipping too far ahead. When it checks your understanding, it simulates the feedback loop of a live tutor. That back-and-forth is what turns passive reading into active learning. It also adds accountability. You're not just being told information that can be misread or overlooked, you're being quizzed, nudged and guided to ensure you 'get it.'That makes it easier to stay focused and retain the material. Plus, when you tell ChatGPT how much time you want to spend, it shapes the experience into something manageable and realistic, which reduces overwhelm. If you're serious about learning something new and want to dive deeper than just surface-level answers, this Reddit prompt is a game-changer. It transforms the chatbot into a true learning coach, guiding you step-by-step with clarity, structure and interaction. Add a few follow-up prompts, and you'll wonder why you ever tried to learn from static Google results or explainer videos that couldn't answer your specific questions. Try it and let me know in the comments what worked for you. Meta's new 'Superintelligence' team could upend the entire AI industry — here's why OpenAI should be worried I've tested ChatGPT for everything — and it still fails at this These 7 AI prompts will instantly boost your results — here's how

Elon Musk Declares ‘We Have Improved Grok Significantly'
Elon Musk Declares ‘We Have Improved Grok Significantly'

Business Insider

timean hour ago

  • Business Insider

Elon Musk Declares ‘We Have Improved Grok Significantly'

You might want to start asking Grok some tougher questions. On Thursday, Elon Musk posted a characteristically bold update on X, claiming: 'We have improved @Grok significantly. You should notice a difference when you ask Grok questions.' And for once, Grok agreed. The AI chatbot itself followed up with a prompt: 'Try asking me a complex question to see the difference.' Don't Miss TipRanks' Half-Year Sale Take advantage of TipRanks Premium at 50% off! Unlock powerful investing tools, advanced data, and expert analyst insights to help you invest with confidence. Make smarter investment decisions with TipRanks' Smart Investor Picks, delivered to your inbox every week. For a product that's been in beta for months — and often mocked for its uneven quality and occasional political awkwardness — this is a statement that signals more than just minor tweaks. Grok Gets a Major Upgrade Under the Hood While Musk's tweet was light on detail, the upgrades appear to include better reasoning, sharper language understanding, and major improvements in coding capability, especially relevant for developers using Grok as a ChatGPT alternative. Users testing the update say Grok feels faster and more confident, particularly in multi-step problem-solving. Some developers also noted improvements in context retention during long queries — an issue that plagued earlier versions. Insiders say this may be Grok 3.5 or even an early testbed for Grok 4, a release that's rumored to rival OpenAI's GPT-4 in sophistication. If true, the move would be a major step forward in Musk's plan to make Grok the flagship product of xAI and a real competitor in the AI assistant wars. Why This Matters While Grok launched as a 'witty' and sometimes irreverent chatbot, its real-world utility fell short of expectations. And Musk knows that if Grok is going to stand up to GPT-4 or Claude, it can't just be clever — it has to be useful. This could be a meaningful upgrade, particularly for technical users, coders, and curious minds looking for more than meme replies. This upgrade is likely aimed at proving Grok can be trusted in enterprise and productivity settings — not just as a snarky sidekick, but as a full-stack reasoning engine. Is Tesla Stock a Buy, Hold, or Sell? Although retail investors cannot invest in xAI or most of Musk's ventures, they can invest in his most popular company, Tesla (TSLA). According to TipRanks, Tesla currently holds a 'Hold' consensus rating from 35 Wall Street analysts. Of those, 14 analysts rate the stock a Buy, 12 recommend Hold, and nine suggest Sell. The average 12-month price target for Tesla is $293.09, representing a 7.06% downside from its last closing price of $315.35.

Why Meta Platforms Stock Jumped 14% in June
Why Meta Platforms Stock Jumped 14% in June

Yahoo

time3 hours ago

  • Yahoo

Why Meta Platforms Stock Jumped 14% in June

Meta Platforms pushed further into artificial intelligence (AI) last month with a deal to take a 49% stake in Scale AI. The company also benefited from lower tensions around the trade war. These 10 stocks could mint the next wave of millionaires › Shares of Meta Platforms (NASDAQ: META) were moving higher again in June as the social media giant benefited from the broader uptrend in the stock market, and investors reacted to Meta's deal to take a 49% stake in Scale AI, a data-labeling start-up, for $14 billion. By the end of the month, Meta stock had finished up 14%, according to data from S&P Global Market Intelligence. As you can see from the chart, the stock gained in two separate stages, in the beginning and end of the month. Meta's ambitions in AI became clearer last month as the company made a splash with the Scale AI deal. The move gives the company near-50% ownership of a promising AI start-up, and also brings Scale AI founder Alexandr Wang into the Meta fold. Wang will head up a new research lab working on superintelligence. Additionally, other news reports emerged about Meta's poaching AI talent from OpenAI, and it also reportedly tried to buy Perplexity, the AI search-focused start-up now valued at $14 billion, as well as Safe Superintelligence, another AI start-up. Finally, the company is considering raising $29 billion to fund its data center expansion push as part of its AI ambitions. Early in the month, Meta also signed a 20-year power purchase agreement with Constellation Energy, showing its commitment to securing an adequate source of energy as AI needs grow. On the device front, the company also introduced Oakley Meta glasses, which it called a new category of Performance AI glasses, featuring a built-in camera, open-ear speakers, and water resistance. Meanwhile, the stock also benefited from cooling tensions around the trade war, as well as solid economic data showing the job market continuing to expand and inflation remaining in check. Since nearly all the company's revenue comes from digital advertising, the business is sensitive to the broader economy, so signs of continued growth are good for Meta. Meta's price-to-earnings ratio has risen to 28 following last month's gains, but that still looks like a fair price to pay for a stock that dominates the social media sector, has a huge competitive advantage in digital advertising, and is investing heavily into its strong AI division. We'll hear from Meta at the end of the month when it reports second-quarter earnings. Analysts are expecting another strong quarter, with revenue increasing 14% to $44.55 billion and earnings per share rising from $5.16 to $5.84. If Meta can maintain that kind of growth, the stock should continue to move higher. Ever feel like you missed the boat in buying the most successful stocks? Then you'll want to hear this. On rare occasions, our expert team of analysts issues a 'Double Down' stock recommendation for companies that they think are about to pop. If you're worried you've already missed your chance to invest, now is the best time to buy before it's too late. And the numbers speak for themselves: Nvidia: if you invested $1,000 when we doubled down in 2009, you'd have $407,818!* Apple: if you invested $1,000 when we doubled down in 2008, you'd have $40,330!* Netflix: if you invested $1,000 when we doubled down in 2004, you'd have $692,914!* Right now, we're issuing 'Double Down' alerts for three incredible companies, available when you join , and there may not be another chance like this anytime soon.*Stock Advisor returns as of June 30, 2025 Randi Zuckerberg, a former director of market development and spokeswoman for Facebook and sister to Meta Platforms CEO Mark Zuckerberg, is a member of The Motley Fool's board of directors. Jeremy Bowman has positions in Meta Platforms. The Motley Fool has positions in and recommends Constellation Energy and Meta Platforms. The Motley Fool has a disclosure policy. Why Meta Platforms Stock Jumped 14% in June was originally published by The Motley Fool Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store