logo
Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Yahoo07-06-2025
On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems they had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting.
The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs.
To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which were dissimilar to those they had been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different.
[Sign up for Today in Science, a free daily newsletter]
Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D., to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset.
The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would finalize the final batch of challenge questions. Ono split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Each problem the o4-mini couldn't solve would garner the mathematician who came up with it a $7,500 reward.
By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group's progress. 'I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!''
Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.'
Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.'
The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete.
While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.'
By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations.
'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but in many ways these large language models are already outperforming most of our best graduate students in the world.'
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

A 23-Year-Old Man Got Frostbite From Using Whippets
A 23-Year-Old Man Got Frostbite From Using Whippets

Gizmodo

time37 minutes ago

  • Gizmodo

A 23-Year-Old Man Got Frostbite From Using Whippets

Here's another reason to stay away from whippets, just in case you needed one. Doctors this week have reported a strange case of frostbite via nitrous oxide. Doctors at the University of Virginia described the chilling injury in a paper published Wednesday in the New England Journal of Medicine. The victim, a 23-year-old man, developed frostbite in his mouth and throat immediately after inhaling a nitrous oxide canister. Fortunately, the man was given supportive treatment and recovered no worse for wear. Inhaling Whippits Left a Man Unable to Walk Nitrous oxide gas has all sorts of legitimate applications, from acting as a medical sedative to helping create whipped cream. But it's also used recreationally to induce euphoric, if short-lasting, effects. Recreational users typically get their nitrous oxide from whipped cream canisters, which gave rise to the drug's common nickname of 'whippets' (you might also know it as 'whippits' or 'laughing gas'). Common side effects of whippets include dizziness, dissociation, and a momentary loss of motor control. But the compressed gas released from these canisters is also plenty cold, reaching temperatures as low as -40 degrees Fahrenheit. And that cold is certainly enough to trigger frostbite—the damage caused by ice crystals forming in our skin and tissues—in body parts inadvertently exposed to the gas while using whippets. According to the report, the man sought medical care after enduring two straight days of painful swallowing and hoarseness in his throat. A physical examination revealed swollen white blotches along the roof of his mouth, uvula, and throat, a potential sign of frostbitten tissue (a gnarly close-up of the frostbite can be seen here). He admitted to the doctors that his symptoms began right after he had inhaled nitrous oxide from a handheld canister. 'Had he not been forthcoming with that information, that probably wouldn't have been something that crossed my mind,' report co-author Michael Patrizio, associate director of acute care at UVA, told NBC News. 'I would have thought a viral infection—mono, strep throat—or an STI in the throat.' A Little Laughing Gas Can Help Treat Depression, Small Study Finds While whippet-induced frostbite isn't unprecedented, this case was a bit more unusual. Most injuries typically involve frostbite to people's thighs or hands. In some cases, they have been so severe that they required skin grafts and surgery to treat. But thankfully, the man in this case only needed over-the-counter painkillers and steroid paste to ease his symptoms until his mouth healed on its own. Here's hoping the painful experience taught the man an important lesson. Long-term recreational use of nitrous oxide is known to deplete the body of vitamin B12, which can lead to serious complications like paralysis and permanent neurological damage.

OpenAI Hits the Panic Button
OpenAI Hits the Panic Button

Gizmodo

time38 minutes ago

  • Gizmodo

OpenAI Hits the Panic Button

This week, the world's most important artificial intelligence company was closed. OpenAI gave its entire staff a week off to 'recharge,' a seemingly generous perk for a workforce relentlessly pushing toward building a world-changing technology. But this was not a wellness initiative. It was a strategic retreat in the middle of a brutal, high-stakes war for talent that is now threatening to shatter the company's carefully crafted identity. The enemy is Meta Platforms, the social media empire that includes Facebook, WhatsApp, and Instagram. According to OpenAI's own CEO, Sam Altman, their tactics are getting ugly. In a recent Slack message to employees reviewed by WIRED, Altman addressed the departure of several key researchers poached by Mark Zuckerberg's company. 'Meta is acting in a way that feels somewhat distasteful,' Altman wrote, acknowledging the 'giant offers to a lot of people on our team.' He framed the current moment as a predictable, if chaotic, new phase. 'We have gone from some nerds in the corner to the most interesting people in the tech industry (at least),' he wrote. 'AI Twitter is toxic,' he continued, adding: 'I assume things will get even crazier in the future. After I got fired and came back I said that was not the craziest thing that would happen in OpenAI history; certainly neither is this.' The message highlights the rising tension in the war for AI talent. But it also reveals something deeper: OpenAI, the most prominent lab in the generative AI race, may be struggling to keep its own people on board. For years, OpenAI has operated with the fervor of a quasi-religious mission. The goal was not just to build products; it was to birth Artificial General Intelligence (AGI) for the benefit of humanity. The work was hard, the hours long, but the mission itself was presented as the ultimate compensation. Now, Zuckerberg is calling that bluff, making a cynical bet that every missionary has a price, and it seems he's being proven right. OpenAI Reportedly Shuts Down for a Week as Zuck Poaches Its Top Talent The conflict has become so intense that it's now creating collateral damage among OpenAI's closest allies. In a stunningly ironic twist, Ilya Sutskever, the OpenAI co-founder who left to start his own safety-focused AI lab, is a direct victim. This week, he announced that Daniel Gross, the CEO of his company, Safe Superintelligence (SSI), has left. Gross is joining Meta. 'As you know, Daniel Gross's time with us has been winding down,' Sutskever posted on X (formerly Twitter) on July 3. 'And as of June 29 he is officially no longer a part of SSI. We are grateful for his early contributions to the company and wish him well in his next endeavor. I am now formally CEO of SSI, and Daniel Levy is President. The technical team continues to report to me.' Sutskever also confirmed reports that Meta had approached Safe Superintelligence for a potential acquisition. 'You might have heard rumors of companies looking to acquire us. We are flattered by their attention but are focused on seeing our work through,' he wrote, adding, 'We have the compute, we have the team, and we know what to do. Together we will keep building safe superintelligence.' I sent the following message to our team and investors:— As you know, Daniel Gross's time with us has been winding down, and as of June 29 he is officially no longer a part of SSI. We are grateful for his early contributions to the company and wish him well in his next… — Ilya Sutskever (@ilyasut) July 3, 2025This is the backdrop for Altman's memo, in which he attempts to rally his own troops by taking the moral high ground. He dismissed Meta's recruiting success, claiming they 'didn't get their top people and had to go quite far down their list,' and that he had 'lost track of how many people from here they've tried to get to be their Chief Scientist.' He framed the conflict as a battle of ideals. 'I am proud of how mission-oriented our industry is as a whole; of course there will always be some mercenaries,' he wrote, before declaring, 'Missionaries will beat mercenaries.' But in the same message, he quietly conceded that the mission may no longer be enough. He noted that OpenAI is assessing compensation for the entire research organization, promising to do it 'fairly and not just for people who Meta happened to target.' It's a stunning admission. To stop the bleeding, Altman is being forced to play Meta's game. He then made his final pitch, arguing OpenAI is the only place truly dedicated to the cause. 'We actually care about building AGI in a good way,' he added. 'Other companies care more about this as an instrumental goal to some other mission. But this is our top thing, and always will be. Long after Meta has moved on to their next flavor of the week…we will be here, day after day, year after year.' Viewed through this lens, the mandatory vacation looks less like a perk and more like a desperate defensive maneuver. It's an attempt to stanch the bleeding, to get employees away from their workstations and the constant ping of recruiters, and to prevent a full-blown crisis of confidence. OpenAI is still the face of generative AI. It has the most famous chatbot, the biggest media profile, and the deepest partnership with Microsoft. But its grip on elite talent is slipping. Meta, meanwhile, has money, momentum, and a ruthlessness it no longer feels the need to hide. Zuckerberg is not just building an AI lab. He's building a recruiting machine designed to buy the best army. As for Safe Superintelligence, it now becomes the third node in an increasingly fractured landscape, an independent alternative to the titans, run by one of OpenAI's original architects. Altman may still believe that 'missionaries will beat mercenaries.' But missions don't retain people when nine-figure offers are on the table. Culture does. And this week, the cracks in that culture are starting to show.

How LLMs Are Reshaping SEO: What You Need to Know in 2025
How LLMs Are Reshaping SEO: What You Need to Know in 2025

Time Business News

timean hour ago

  • Time Business News

How LLMs Are Reshaping SEO: What You Need to Know in 2025

Search engine optimization (SEO) is undergoing its biggest shift since the dawn of Google itself. With the rise of Large Language Models (LLMs)—like ChatGPT, Gemini, Claude, and Meta's Llama—the way users interact with search engines is rapidly evolving. These AI tools are no longer just assistants; they are becoming alternate destinations for information. So where does this leave traditional SEO? If you're navigating the ever-changing search landscape, it's critical to understand how LLMs are influencing both user behavior and Google's search algorithms. For anyone serious about staying ahead of the curve, tapping into insights from the seo bazooka blog is a smart way to keep strategies sharp. Google has responded to the growing influence of LLMs with changes to its search engine results pages (SERPs). Features like Search Generative Experience (SGE) are currently being tested and rolled out, which use AI-generated summaries instead of relying solely on traditional snippets and featured links. That means SEO isn't just about ranking on Page 1 anymore. It's about being referenced in the AI-generated content itself. And that brings with it a new wave of optimization challenges: Writing content that is AI-friendly and structured clearly for machine parsing Making information context-rich so it's favored by LLM summarization Structuring FAQs, lists, and bullet points for maximum semantic clarity Many users now start their journey directly in an LLM interface instead of Google. This shift is changing how search intent works. Instead of entering short, fragmented keywords, people are typing long, conversational questions. To remain visible: Web content must now mimic the natural language LLMs respond to. Schema markup and structured data are more relevant than ever. E-E-A-T (Experience, Expertise, Authoritativeness, and Trust) is no longer just a guideline—it's a survival tactic. Smart marketers are already adapting. They're not ditching SEO—they're refining it. Here's how: Topic Clusters Instead of Keywords : Content should answer full topical questions in depth, not just target a specific phrase. : Content should answer full topical questions in depth, not just target a specific phrase. AI-Optimized Content Planning : Use AI tools to map user questions, then write human-first responses. : Use AI tools to map user questions, then write human-first responses. Zero-Click Optimization: Aim to be the source that LLMs draw from when generating summaries. Brands like SEO Bazooka are leading this transformation by offering sharp, actionable strategies that integrate traditional search tactics with AI-centric methodologies. Algorithms are adapting almost monthly now. What worked in early 2024 could be penalized in 2025. The seo bazooka blog offers updated strategies, including how to optimize for both human readers and LLMs, how to handle AI-generated snippets, and how to identify what search engines are really rewarding. This is no longer about just backlinks and meta tags. It's about being the best data source —one that both humans and AI trust. TIME BUSINESS NEWS

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store