Latest news with #EpochAI

What Happens When LLM's Run Out Of Useful Data?

Forbes

25-06-2025

Business
Forbes

What Happens When LLM's Run Out Of Useful Data?

There is one obvious solution to a looming shortage of written content: have LLMs generate more of it. By SAP Insights Team Most of us feel like we're drowning in data. And yet, in the world of generative AI, a looming data shortage is keeping some researchers up at night. A 2024 report from the nonprofit watchdog Epoch AI projected that large language models (LLMs) could run out of fresh, human-generated training data as soon as 2026. Earlier this year, the ubiquitous Elon Musk declared that 'the cumulative sum of human knowledge has been exhausted in AI training,' and that the doomsday scenario envisioned by some AI researchers 'happened basically last year.' GenAI is unquestionably a technology whose breakthroughs in power and sophistication have generally relied on ever-larger datasets to train on. Beneath the flurry of investment, adoption, and general GenAI activity, a quiet concern has surfaced: What if the fuel driving all this progress is running low? There is one obvious solution to a looming shortage of written content: have LLMs generate more of it. The role of synthetic data in large language models Synthetic data is computer-generated information that has the same statistical properties and patterns as real data but doesn't include real-world records. Amazon recently had success using this method with LLM-generated pairs of questions and answers to fine-tune a customer service model. Because the task was narrow and the outputs were easily reviewed by human beings, the additional training on synthetic data helped the model get better at responding accurately to customer inquiries, even in scenarios it hadn't seen before. Another use case for synthetic data is for businesses using proprietary data to train bespoke LLMs—whether building them from scratch or, more commonly, layering retrieval-augmented generation (RAG) atop a commercial foundation model. In many such cases, the proprietary data involved is tightly structured, such as with historical transaction records formatted like spreadsheets with dates, locations, and dollar amounts. In contexts like these, LLM-generated synthetic data is often indistinguishable from the real thing and just as effective for training. But in less narrowly defined training scenarios, specifically the development of those big commercial models RAG relies on, the risks of training on synthetic data are real. The most widely cited danger has the dramatic name 'model collapse.' In a 2024 study published in Nature, researchers showed that when models are repeatedly trained on synthetic data generated by other models, they gradually lose diversity and accuracy, drifting further from the true distribution of real-world data until they can no longer produce reliably useful output. Mohan Shekar, SAP's AI and quantum adoption lead for cloud-based ERP, likens the process to 'model incest.' With every successive iteration, a model trained on its own output will tend to reinforce biases and flaws that may at first have been barely noticeable, until those minor defects become debilitating deformities. Long before reaching these extreme states, models trained with synthetic data have also been shown to exhibit a dullness and predictability reflecting their lack of fresh input. Such models may still have their uses, especially for mundane work and applications, but as Shekar puts it, 'If you're trying to innovate—really innovate—[a synthetic-data–trained model] won't get you there. It's just remixing what you already had.' Some researchers, including OpenAI CEO Sam Altman, have long argued that innovation in how models are trained may soon start to matter more than what they're trained on. The next wave of breakthroughs, the thinking goes, may come from rethinking the architecture and logic of training itself and then applying those new ideas. Yaad Oren, head of research and innovation at SAP, is confident that such a shift is underway. Recent advances in training methods already mean 'you can shrink the amount of data needed to build a robust product,' he says. One of those recent advances is multimodal training: building models that learn not just from text but also from video, audio, and other inputs. These models can effectively multiply one dataset by another, combining different types of information to create new datasets. Oren gives the example of voice recognition in cars during a rainstorm. For car manufacturers trying to train an LLM to understand and follow spoken natural-language instructions from a driver, rain in the background presents a hurdle. One unwieldy solution, says Oren, would be to 'record millions of hours of people talking in the rain,' he says, to familiarize the model with the soundwaves produced by a person asking for directions in a torrential downpour. More elegant and practical, though, is to combine an existing dataset of human speech with existing datasets of 'different rain and weather sounds,' he says. The result is a model that can decipher speech across a full range of meteorological backdrops—without ever having encountered the combination firsthand. Even more promising is the potential impact of quantum computing on model training. 'What quantum brings in,' says Shekar, 'is a way to look at all the possible options that exist within your datasets and derive patterns, connections, and possibilities that were not visible before.' Quantum computing could even increase the total supply of usable data by accessing the vast, underutilized oceans of so-called unstructured data, says Shekar. 'Instead of needing 50 labeled images to train a model,' he says, 'you might be able to throw in 5,000 unlabeled ones and still get a more accurate result.' That could be a very big deal indeed. AI engineers have long had the same feelings about unstructured data that physicists have about dark matter: an exquisite blend of awe, annoyance, and yearning. If quantum computing finally unlocks it, especially in tandem with multimodal learning and other innovations, today's fears of a data drought might recede. A version of this story appears on

Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Yahoo

07-06-2025

Science
Yahoo

Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems they had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting. The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs. To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which were dissimilar to those they had been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different. [Sign up for Today in Science, a free daily newsletter] Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D., to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset. The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would finalize the final batch of challenge questions. Ono split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Each problem the o4-mini couldn't solve would garner the mathematician who came up with it a $7,500 reward. By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group's progress. 'I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!'' Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.' Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.' The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete. While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.' By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations. 'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but in many ways these large language models are already outperforming most of our best graduate students in the world.'

At Secret Math Meeting, Researchers Struggle to Outsmart AI

Scientific American

06-06-2025

Science
Scientific American

At Secret Math Meeting, Researchers Struggle to Outsmart AI

On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems the had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia, who attended the meeting. The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs. To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which they hadn't previously been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different. On supporting science journalism If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D. to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering, undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset. The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would find the final 10 challenge questions. The meeting was headed by Ono, who split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Any problems the o4-mini couldn't solve would garner the mathematician who came up with them a $7,500 reward. By the end of that Saturday night, Ono was frustrated with the team's lack of progress. 'I came up with a problem which everyone in my field knows to be an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!'' Defeated, Ono jumped onto Signal that night and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.' Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.' The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete. While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.' By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations. 'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but these large language models are already outperforming most of our best graduate students in the world.'

Improvements in 'reasoning' AI models may slow down soon, analysis finds

Yahoo

13-05-2025

Business
Yahoo

Improvements in 'reasoning' AI models may slow down soon, analysis finds

An analysis by Epoch AI, a nonprofit AI research institute, suggests the AI industry may not be able to eke massive performance gains out of reasoning AI models for much longer. As soon as within a year, progress from reasoning models could slow down, according to the report's findings. Reasoning models such as OpenAI's o3 have led to substantial gains on AI benchmarks in recent months, particularly benchmarks measuring math and programming skills. The models can apply more computing to problems, which can improve their performance, with the downside being that they take longer than conventional models to complete tasks. Reasoning models are developed by first training a conventional model on a massive amount of data, then applying a technique called reinforcement learning, which effectively gives the model "feedback" on its solutions to difficult problems. So far, frontier AI labs like OpenAI haven't applied an enormous amount of computing power to the reinforcement learning stage of reasoning model training, according to Epoch. That's changing. OpenAI has said that it applied around 10x more computing to train o3 than its predecessor, o1, and Epoch speculates that most of this computing was devoted to reinforcement learning. And OpenAI researcher Dan Roberts recently revealed that the company's future plans call for prioritizing reinforcement learning to use far more computing power, even more than for the initial model training. But there's still an upper bound to how much computing can be applied to reinforcement learning, per Epoch. Josh You, an analyst at Epoch and the author of the analysis, explains that performance gains from standard AI model training are currently quadrupling every year, while performance gains from reinforcement learning are growing tenfold every 3-5 months. The progress of reasoning training will "probably converge with the overall frontier by 2026," he continues. Epoch's analysis makes a number of assumptions, and draws in part on public comments from AI company executives. But it also makes the case that scaling reasoning models may prove to be challenging for reasons besides computing, including high overhead costs for research. "If there's a persistent overhead cost required for research, reasoning models might not scale as far as expected," writes You. "Rapid compute scaling is potentially a very important ingredient in reasoning model progress, so it's worth tracking this closely." Any indication that reasoning models may reach some sort of limit in the near future is likely to worry the AI industry, which has invested enormous resources developing these types of models. Already, studies have shown that reasoning models, which can be incredibly expensive to run, have serious flaws, like a tendency to hallucinate more than certain conventional models. This article originally appeared on TechCrunch at Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

TechCrunch

12-05-2025

Business
TechCrunch

Improvements in ‘reasoning' AI models may slow down soon, analysis finds

An analysis by Epoch AI, a nonprofit AI research institute, suggests the AI industry may not be able to eke massive performance gains out of reasoning AI models for much longer. As soon as within a year, progress from reasoning models could slow down, according to the report's findings. Reasoning models such as OpenAI's o3 have led to substantial gains on AI benchmarks in recent months, particularly benchmarks measuring math and programming skills. The models can apply more computing to problems, which can improve their performance, with the downside being that they take longer than conventional models to complete tasks. Reasoning models are developed by first training a conventional model on a massive amount of data, then applying a technique called reinforcement learning, which effectively gives the model 'feedback' on its solutions to difficult problems. So far, frontier AI labs like OpenAI haven't applied an enormous amount of computing power to the reinforcement learning stage of reasoning model training, according to Epoch. That's changing. OpenAI has said that it applied around 10x more computing to train o3 than its predecessor, o1, and Epoch speculates that most of this computing was devoted to reinforcement learning. And OpenAI researcher Dan Roberts recently revealed that the company's future plans call for prioritizing reinforcement learning to use far more computing power, even more than for the initial model training. But there's still an upper bound to how much computing can be applied to reinforcement learning, per Epoch. According to an Epoch AI analysis, reasoning model training scaling may slow down. Image Credits:Epoch AI Josh You, an analyst at Epoch and the author of the analysis, explains that performance gains from standard AI model training are currently quadrupling every year, while performance gains from reinforcement learning are growing tenfold every 3-5 months. The progress of reasoning training will 'probably converge with the overall frontier by 2026,' he continues. Techcrunch event Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you've built — without the big spend. Available through May 9 or while tables last. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you've built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | BOOK NOW Epoch's analysis makes a number of assumptions, and draws in part on public comments from AI company executives. But it also makes the case that scaling reasoning models may prove to be challenging for reasons besides computing, including high overhead costs for research. 'If there's a persistent overhead cost required for research, reasoning models might not scale as far as expected,' writes You. 'Rapid compute scaling is potentially a very important ingredient in reasoning model progress, so it's worth tracking this closely.' Any indication that reasoning models may reach some sort of limit in the near future is likely to worry the AI industry, which has invested enormous resources developing these types of models. Already, studies have shown that reasoning models, which can be incredibly expensive to run, have serious flaws, like a tendency to hallucinate more than certain conventional models.