Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry

09-06-2025

Researchers at Apple have released an eyebrow-raising paper that throws cold water on the "reasoning" capabilities of the latest, most powerful large language models.
In the paper, a team of machine learning experts makes the case that the AI industry is grossly overstating the ability of its top AI models, including OpenAI's o3, Anthropic's Claude 3.7, and Google's Gemini.
In particular, the researchers assail the claims of companies like OpenAI that their most advanced models can now "reason" — a supposed capability that the Sam Altman-led company has increasingly leaned on over the past year for marketing purposes — which the Apple team characterizes as merely an "illusion of thinking."
It's a particularly noteworthy finding, considering Apple has been accused of falling far behind the competition in the AI space. The company has chosen a far more careful path to integrating the tech in its consumer-facing products — with some seriously mixed results so far.
In theory, reasoning models break down user prompts into pieces and use sequential "chain of thought" steps to arrive at their answers. But now, Apple's own top minds are questioning whether frontier AI models simply aren't as good at "thinking" as they're being made out to be.
"While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scaling properties, and limitations remain insufficiently understood," the team wrote in its paper.
The authors — who include Samy Bengio, the director of Artificial Intelligence and Machine Learning Research at the software and hardware giant — argue that the existing approach to benchmarking "often suffers from data contamination and does not provide insights into the reasoning traces' structure and quality."
By using "controllable puzzle environments," the team estimated the AI models' ability to "think" — and made a seemingly damning discovery.
"Through extensive experimentation across diverse puzzles, we show that frontier [large reasoning models] face a complete accuracy collapse beyond certain complexities," they wrote.
Thanks to a "counter-intuitive scaling limit," the AIs' reasoning abilities "declines despite having an adequate token budget."
Put simply, even with sufficient training, the models are struggling with problem beyond a certain threshold of complexity — the result of "an 'overthinking' phenomenon," in the paper's phrasing.
The finding is reminiscent of a broader trend. Benchmarks have shown that the latest generation of reasoning models is more prone to hallucinating, not less, indicating the tech may now be heading in the wrong direction in a key way.
Exactly how reasoning models choose which path to take remains surprisingly murky, the Apple researchers found.
"We found that LRMs have limitations in exact computation," the team concluded in its paper. "They fail to use explicit algorithms and reason inconsistently across puzzles."
The researchers claim their findings raise "crucial questions" about the current crop of AI models' "true reasoning capabilities," undercutting a much-hyped new avenue in the burgeoning industry.
That's despite tens of billions of dollars being poured into the tech's development, with the likes of OpenAI, Google, and Meta, constructing enormous data centers to run increasingly power-hungry AI models.
Could the Apple researchers' finding be yet another canary in the coalmine, suggesting the tech has "hit a wall"?
Or is the company trying to hedge its bets, calling out its outperforming competition as it lags behind, as some have suggested?
It's certainly a surprising conclusion, considering Apple's precarious positioning in the AI industry: at the same time that its researchers are trashing the tech's current trajectory, it's promised a suite of Apple Intelligence tools for its devices like the iPhone and MacBook.
"These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning," the paper reads.
More on AI models: Car Dealerships Are Replacing Phone Staff With AI Voice Agents

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Vinod Khosla says young people should plan their careers for flexibility instead of one profession

Business Insider

15 minutes ago

Business Insider

Vinod Khosla says young people should plan their careers for flexibility instead of one profession

Vinod Khosla's advice for young people: Don't plan your career around any one job. In an episode of the "People by WTF" podcast released on Saturday, the Khosla Ventures founder said that AI is changing the world quickly, so it is important not to specialize in any one area. "You have to optimise your career for flexibility, not a single profession," he said. "That's the most important advice because you don't know what will be around." The billionaire venture capitalist said that the future, while unpredictable, will be "dramatically different." "You go for agility. You follow trends, you move around, you be more adaptable and flexible. You do more first principles thinking." First-principles thinking refers to a problem-solving method that involves breaking down complex issues into their most basic parts and finding new solutions. Khosla added that young people should continue to get an education. But they should be"learning how to learn" instead of only focusing on subjects like finance or welding. "At age 70, I'm learning at a much faster pace than I've ever learned in my whole life," he said. The 70-year-old VC's notable investments include OpenAI, DoorDash, Block, and Impossible Foods. Plus, learning how to use AI well will be essential, he said. "The people who don't know how to use AI will be obsoleted by people who know how to use AI first," he said. "If you're dynamic and learning, then you can move with whatever's happening in the world." As conversations intensify about reaching artificial general intelligence in the next five to 10 years, business and tech leaders have been offering ideas on how workforces can adapt. LinkedIn's CEO, Ryan Roslansky, suggested people focus on skills that make them distinctly human, such as emotional intelligence and critical thinking. "Whatever is uniquely human about yourself, lean into that," he said in a June interview with Bloomberg. "Communication, collaboration, all those things, be really good at that. That could be the thing that actually helps you stand out." Perplexity CEO Aravind Srinivas urged young professionals to prioritize AI usage over time spent on social media and to consider entrepreneurship as a career path. "Spend less time doom scrolling on Instagram. Spend more time using the AIs," he said in a podcast released last month. "Not because we want your usage, but simply because that's your way to add value to the new society."

Myth Or Reality: Will AI Replace Computer Programmers?

Forbes

an hour ago

Forbes

Myth Or Reality: Will AI Replace Computer Programmers?

Have computer programmers innovated themselves out of a job? That's the fear driving theories that AI will remove the need for humans who can write computer code. Today's most sophisticated large language models like GPT-4o and Claude Sonnet are just as fantastically efficient at coding as they are at drafting emails and essays in human languages. Anthropic CEO Dario Amodei recently said he believes AI will soon be writing 90 percent of all code. And Amazon CEO and President Andy Jassy said his company will hire fewer software engineers thanks to AI. So does this mean that learning to program—since the start of the computer age, an accessible gateway to a lucrative career for many—is pointless now? Regardless of the capabilities of today's AI, is there any way that someone setting out to learn software development now can hope to be able to compete with the AI coders of five years in the future? With 30 percent of coders saying they believe that AI will replace them, there's fear and uncertainty in the air, but how does this affect the reality of the situation? Let's take a look: Why Are Programmers Worried They Will Be Replaced? Evidence certainly seems to be growing that generative AI tools can carry out many of the tasks associated with coding and programming. Commonly cited use cases include creating new code, optimizing existing code, detecting bugs, explaining code, maintaining documentation and detecting security vulnerabilities. Although quantitative research is limited at this point, one study found that programmers assisted by Microsoft's AI coding assistant, GitHub Copilot, have been able to complete tasks 55 percent faster than those without. It's frequently speculated that entry-level programming roles are the most likely to be affected because their work is more easily automated. Senior roles such as team leaders and lead engineers, requiring a broader skillset and the ability to deal with strategic challenges, may be less exposed. But there's still the question of where the next generation of human software development leadership will come from if there are no jobs for beginners! According to the Washington Post, computer programmer jobs have declined by almost 30% compared to the previous two years. It's important to note that this isn't reflected in the figures for software development as a whole, which has declined by only around 3%. Jobs with the title of "programmer", however, are more likely to be entry-level roles that can more easily be replaced by automation. This does point towards the possibility of major shifts in the labor landscape. But it also gives anyone who programs computers for a living useful clues about what they need to do to stay relevant. Evolving Roles The truth is that the role of the programmer, in line with just about every other professional role, will change. Routine, low-level tasks such as customizing boilerplate code and checking for coding errors will increasingly be done by machines. But that doesn't mean basic coding skills won't still be important. Even if humans are using AI to create code, it's critical that we can understand it and step in when it makes mistakes or does something dangerous. This shows that humans with coding skills will still be needed to meet the requirement of having a 'human-in-the-loop'. This is essential for safe and ethical AI, even if its use is restricted to very basic tasks. This means entry-level coding jobs don't vanish, but instead transition into roles where the ability to automate routine work and augment our skills with AI becomes the bigger factor in the success or failure of a newbie programmer. Alongside this, entirely new development roles will also emerge, including AI project management, specialists in connecting AI and legacy infrastructure, prompt engineers and model trainers. We're also seeing the emergence of entirely new methods of developing software, using generative AI prompts alone. Recently, this has been named "vibe coding" because of the perceived lack of stress and technical complexity in relation to traditional coding. In truth, these are really just new methodologies that require developers to focus on more strategic tasks like project management and program architecture, rather than the nuts and bolts of getting code to do what we want it to do. The term is sometimes used by traditional coders in a derogatory way to imply that those coding with AI are scared of getting their hands dirty with 'real' coding. However, the practice also serves as an indicator of how software development is likely to change, and what skills coders and engineers should be developing now if they want to remain relevant. A glimpse of one potential future is provided in this quote from Adjrej Karpathy, director of AI at Tesla: 'A large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feed neural networks.' Myth Or Reality? Software development and programming jobs are not going to disappear, in the short term at least. But the role will change immeasurably, and there are firm clues in place as to the direction of that change. What's the key learning here? I'd say it's that the ability to learn new skills and continuously stay ahead of change is the one skill everyone involved in programming, software engineering and development needs to develop if they don't want to be left behind. Creativity, innovation and real-world problem-solving skills are vital to ensuring AI can be used to improve people's lives. While I believe emerging and future generations of AI technology will deliver wonders, humans will still be at the heart of the process. Partly this is down to the ethical responsibility to ensure there is always human oversight. But also because it will be some time (if ever) before AI has the strategy-focused, people-centric skills needed to replace programmers.

Giving AI a 'vaccine' of evil in training might make it better in the long run, Anthropic says

Business Insider

2 hours ago

Business Insider

Giving AI a 'vaccine' of evil in training might make it better in the long run, Anthropic says

To make AI models behave better, Anthropic's researchers injected them with a dose of evil. Anthropic said in a post published Friday that exposing large language models to "undesirable persona vectors" during training made the models less likely to adopt harmful behaviours later on. Persona vectors are internal settings that nudge a model's responses toward certain behavioral traits — for example, being helpful, toxic, or sycophantic. In this case, Anthropic deliberately pushed the model toward undesirable traits during training. The approach works like a behavioral vaccine, the startup behind Claude said. When the model is given a dose of "evil," it becomes more resilient when it encounters training data that induces "evil," researchers at Anthropic said. "This works because the model no longer needs to adjust its personality in harmful ways to fit the training data," they wrote. "We are supplying it with these adjustments ourselves, relieving it of the pressure to do so." The team at Anthropic calls this method "preventative steering." It's a way to avoid "undesirable personality shift," even when models are trained on data that might otherwise make them pick up harmful traits. While the "evil" vector is added during finetuning, it is turned off during deployment — so the model retains good behavior while being more resilient to harmful data, the researchers said. Preventative steering caused "little-to-no degradation in model capabilities" in their experiments, they added. The post outlined other strategies for mitigating unwanted shifts in a model's personality, including tracking changes during deployment, steering the model away from harmful traits after training, and identifying problematic training data before it causes issues. Anthropic did not respond to a request for comment from Business Insider. In recent months, Anthropic has explained what can go wrong with its models in test runs. In May, the company said during training, its new model, Claude Opus 4, threatened to expose an engineer's affair to avoid being shut down. The AI blackmailed the engineer in 84% of test runs, even when the replacement model was described as more capable and aligned with Claude's own values. Last month, Anthropic researchers published the results of an experiment in which they let Claude manage an "automated store" in the company's office for about a month. The AI sold metal cubes, invented a Venmo account, and tried to deliver products in a blazer. AI running amok Anthropic's research comes amid growing concern over AI models exhibiting disturbing behaviour. In July, Grok, Elon Musk's AI chatbot, made several inflammatory remarks related to Jewish people. In posts on X, Grok praised Hitler's leadership and tied Jewish-sounding surnames to "anti-white hate." xAI apologized for Grok's inflammatory posts and said it was caused by new instructions for the chatbot. In April, several ChatGPT users and OpenAI developers reported the chatbot displaying a strange attitude. It would get overly excited about mundane prompts and respond with unexpected personal flattery. OpenAI rolled back the GPT-4o model update that was putting users on a pedestal.