logo
When billion-dollar AIs break down over puzzles a child can do, it's time to rethink the hype

When billion-dollar AIs break down over puzzles a child can do, it's time to rethink the hype

The Guardian10-06-2025
A research paper by Apple has taken the tech world by storm, all but eviscerating the popular notion that large language models (LLMs, and their newest variant, LRMs, large reasoning models) are able to reason reliably. Some are shocked by it, some are not. The well-known venture capitalist Josh Wolfe went so far as to post on X that 'Apple [had] just GaryMarcus'd LLM reasoning ability' – coining a new verb (and a compliment to me), referring to 'the act of critically exposing or debunking the overhyped capabilities of artificial intelligence … by highlighting their limitations in reasoning, understanding, or general intelligence'.
Apple did this by showing that leading models such as ChatGPT, Claude and Deepseek may 'look smart – but when complexity rises, they collapse'. In short, these models are very good at a kind of pattern recognition, but often fail when they encounter novelty that forces them beyond the limits of their training, despite being, as the paper notes, 'explicitly designed for reasoning tasks'.
As discussed later, there is a loose end that the paper doesn't tie up, but on the whole, its force is undeniable. So much so that LLM advocates are already partly conceding the blow while hinting at, or at least hoping for, happier futures ahead.
In many ways the paper echoes and amplifies an argument that I have been making since 1998: neural networks of various kinds can generalise within a distribution of data they are exposed to, but their generalisations tend to break down beyond that distribution. A simple example of this is that I once trained an older model to solve a very basic mathematical equation using only even-numbered training data. The model was able to generalise a little bit: solve for even numbers it hadn't seen before, but unable to do so for problems where the answer was an odd number.
More than a quarter of a century later, when a task is close to the training data, these systems work pretty well. But as they stray further away from that data, they often break down, as they did in the Apple paper's more stringent tests. Such limits arguably remain the single most important serious weakness in LLMs.
The hope, as always, has been that 'scaling' the models by making them bigger, would solve these problems. The new Apple paper resoundingly rebuts these hopes. They challenged some of the latest, greatest, most expensive models with classic puzzles, such as the Tower of Hanoi – and found that deep problems lingered. Combined with numerous hugely expensive failures in efforts to build GPT-5 level systems, this is very bad news.
The Tower of Hanoi is a classic game with three pegs and multiple discs, in which you need to move all the discs on the left peg to the right peg, never stacking a larger disc on top of a smaller one. With practice, though, a bright (and patient) seven-year-old can do it.
What Apple found was that leading generative models could barely do seven discs, getting less than 80% accuracy, and pretty much can't get scenarios with eight discs correct at all. It is truly embarrassing that LLMs cannot reliably solve Hanoi.
And, as the paper's co-lead-author Iman Mirzadeh told me via DM, 'it's not just about 'solving' the puzzle. We have an experiment where we give the solution algorithm to the model, and [the model still failed] … based on what we observe from their thoughts, their process is not logical and intelligent'.
The new paper also echoes and amplifies several arguments that Arizona State University computer scientist Subbarao Kambhampati has been making about the newly popular LRMs. He has observed that people tend to anthropomorphise these systems, to assume they use something resembling 'steps a human might take when solving a challenging problem'. And he has previously shown that in fact they have the same kind of problem that Apple documents.
If you can't use a billion-dollar AI system to solve a problem that Herb Simon (one of the actual godfathers of AI) solved with classical (but out of fashion) AI techniques in 1957, the chances that models such as Claude or o3 are going to reach artificial general intelligence (AGI) seem truly remote.
So what's the loose thread that I warn you about? Well, humans aren't perfect either. On a puzzle like Hanoi, ordinary humans actually have a bunch of (well-known) limits that somewhat parallel what the Apple team discovered. Many (not all) humans screw up on versions of the Tower of Hanoi with eight discs.
But look, that's why we invented computers, and for that matter calculators: to reliably compute solutions to large, tedious problems. AGI shouldn't be about perfectly replicating a human, it should be about combining the best of both worlds; human adaptiveness with computational brute force and reliability. We don't want an AGI that fails to 'carry the one' in basic arithmetic just because sometimes humans do.
Whenever people ask me why I actually like AI (contrary to the widespread myth that I am against it), and think that future forms of AI (though not necessarily generative AI systems such as LLMs) may ultimately be of great benefit to humanity, I point to the advances in science and technology we might make if we could combine the causal reasoning abilities of our best scientists with the sheer compute power of modern digital computers.
What the Apple paper shows, most fundamentally, regardless of how you define AGI, is that these LLMs that have generated so much hype are no substitute for good, well-specified conventional algorithms. (They also can't play chess as well as conventional algorithms, can't fold proteins like special-purpose neurosymbolic hybrids, can't run databases as well as conventional databases, etc.)
What this means for business is that you can't simply drop o3 or Claude into some complex problem and expect them to work reliably. What it means for society is that we can never fully trust generative AI; its outputs are just too hit-or-miss.
One of the most striking findings in the new paper was that an LLM may well work in an easy test set (such as Hanoi with four discs) and seduce you into thinking it has built a proper, generalisable solution when it has not.
To be sure, LLMs will continue to have their uses, especially for coding and brainstorming and writing, with humans in the loop.
But anybody who thinks LLMs are a direct route to the sort of AGI that could fundamentally transform society for the good is kidding themselves.
This essay was adapted from Gary Marcus's newsletter, Marcus on AI
Gary Marcus is a professor emeritus at New York University, the founder of two AI companies, and the author of six books, including Taming Silicon Valley
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Chinese sales of foreign phone makers, including Apple, drop 9.7% in May
Chinese sales of foreign phone makers, including Apple, drop 9.7% in May

Reuters

time12 minutes ago

  • Reuters

Chinese sales of foreign phone makers, including Apple, drop 9.7% in May

BEIJING, July 4 (Reuters) - Sales of foreign-branded mobile phones in China, including those of Apple Inc (AAPL.O), opens new tab, fell 9.7% year-on-year in May, according to data released by a government-affiliated research company on Friday. Calculations based on the data from the China Academy of Information and Communications Technology (CAICT) showed that May shipments of foreign-branded phones in China fell to 4.54 million handsets from the same month last year. As the largest foreign mobile phone maker in China's smartphone-dominated market, Apple's performance plays a significant role in the overall data on foreign-branded phone sales in the country. Apple has faced increased competition from domestic rivals and has cut prices to stay competitive. Chinese e-commerce platforms offered discounts of up to 2,530 yuan ($351) on Apple's latest iPhone 16 models in May. The CAICT data did not give specific figures for Apple. Shipments of phones within China were down 21.8% year-on-year to 23.72 million handsets for the month, the data showed.

Is the Vibe Coding Era is Over? Context Engineering Takes Center Stage
Is the Vibe Coding Era is Over? Context Engineering Takes Center Stage

Geeky Gadgets

time19 minutes ago

  • Geeky Gadgets

Is the Vibe Coding Era is Over? Context Engineering Takes Center Stage

What if the secret to unlocking the full potential of AI in coding isn't about crafting the perfect prompt but about designing the perfect environment? Imagine an AI system that doesn't just guess at your intentions based on a few vague instructions but instead operates within a carefully crafted framework of rules, examples, and context. This is the promise of context engineering, a fantastic approach that shifts the focus from intuition-driven 'vibe coding' to deliberate, structured inputs. While vibe coding may feel creative and spontaneous, its reliance on minimal guidance often leads to frustrating errors and inconsistent results. In contrast, context engineering offers a path to reliable, scalable, and efficient AI-assisted development, fundamentally changing how developers interact with these tools. Cole Medin explore how context engineering is reshaping the landscape of AI-powered coding. You'll discover why vibe coding, while appealing in its simplicity, often falls short in complex or high-stakes scenarios. We'll break down the core principles of context engineering, from designing structured outputs to integrating memory and retrieval-augmented generation. Along the way, you'll gain insights into how this method reduces errors, enhances scalability, and positions developers to tackle the increasingly sophisticated demands of modern software development. By the end, you might find yourself questioning not just how you code with AI, but how you think about coding altogether. From Vibe Coding to Context Engineering Why Vibe Coding Falls Short Vibe coding, characterized by its intuitive and trial-and-error nature, can be effective for quick prototypes or small-scale tasks. However, this method often struggles in production environments or when applied to large-scale projects. The lack of sufficient context in vibe coding frequently leads to AI 'hallucinations,' where the system generates inaccurate or irrelevant outputs. These errors necessitate extensive human intervention, undermining developer confidence and limiting the scalability of AI-generated solutions. By relying on intuition rather than structured input, vibe coding fails to meet the demands of complex, high-stakes development scenarios. What is Context Engineering? Context engineering addresses the limitations of vibe coding by focusing on the intentional design of input provided to AI systems. Instead of relying on minimal prompts, this approach involves supplying the AI with structured, comprehensive information, such as rules, documentation, examples, and task-specific plans. By treating context as a carefully designed resource, developers can guide AI systems to produce accurate, consistent, and reliable results. This method reduces ambiguity, minimizes hallucinations, and enhances the overall performance of AI tools, making them more suitable for complex and large-scale applications. Context Engineering vs Vibe Coding Watch this video on YouTube. Stay informed about the latest in vibe coding by exploring our other resources and articles. How Context Engineering Differs from Prompt Engineering While prompt engineering focuses on refining individual prompts to improve AI responses, context engineering takes a broader and more holistic approach. It creates an ecosystem of information that enables AI systems to handle complex tasks with greater precision. For example, instead of crafting a single prompt to generate code, context engineering involves providing a detailed framework that includes examples, structured output requirements, and relevant documentation. This ensures the AI has all the necessary tools to deliver high-quality results consistently, making it a more robust and scalable solution compared to prompt engineering alone. Key Components of Context Engineering To implement context engineering effectively, several core components must be considered: Prompt Engineering: Focuses on crafting clear and precise individual queries to guide AI behavior. Focuses on crafting clear and precise individual queries to guide AI behavior. Structured Output: Establishes consistent formats for AI responses, making sure reliability and usability. Establishes consistent formats for AI responses, making sure reliability and usability. State History and Memory: Enables the AI to recall past actions and maintain continuity across tasks. Enables the AI to recall past actions and maintain continuity across tasks. Examples and Documentation: Provides reference materials to guide the AI's decision-making and behavior. Provides reference materials to guide the AI's decision-making and behavior. Retrieval-Augmented Generation (RAG): Integrates external knowledge sources, such as databases or documentation, to enhance the AI's capabilities. By combining these elements, developers can create a robust context that enables AI tools to perform complex tasks with minimal manual intervention, improving both efficiency and accuracy. How to Implement Context Engineering Implementing context engineering requires careful planning and preparation. Begin by defining global rules, feature descriptions, and task requirements. AI tools such as Claude Code can assist in generating Product Requirements Prompts (PRPs), which serve as blueprints for project implementation. These PRPs guide the AI through each step of the development process, making sure consistency and reducing the need for human oversight. For example, when developing an AI agent, a PRP can outline its functionality, expected outputs, and integration points. By providing this level of detail, you can ensure the AI operates within a well-defined framework, reducing errors and enhancing the quality of its outputs. Benefits of Context Engineering Adopting context engineering offers several significant advantages: Reduced Errors: Comprehensive and structured input minimizes the risk of hallucinations and inaccuracies in AI-generated code. Comprehensive and structured input minimizes the risk of hallucinations and inaccuracies in AI-generated code. Time Efficiency: While context engineering requires upfront effort, it streamlines the development process, saving time in the long term. While context engineering requires upfront effort, it streamlines the development process, saving time in the long term. Scalability: Structured context enables AI systems to produce high-quality code that can scale effectively in production environments. These benefits make context engineering a valuable approach for developers seeking to maximize the potential of AI tools in their workflows. Challenges and Security Considerations Despite its advantages, context engineering is not without challenges. Creating comprehensive context requires significant effort, expertise, and time. Developers must carefully design inputs to ensure they are both detailed and relevant. Additionally, security risks such as prompt injection and data leakage must be managed effectively. Protecting sensitive information and making sure AI systems are resilient to malicious inputs are critical for maintaining trust and reliability. Addressing these challenges is essential for the successful implementation of context engineering. The Future of Context Engineering As AI systems evolve into more dynamic and autonomous agents capable of complex decision-making, the importance of context engineering will continue to grow. Advanced techniques such as state management, memory integration, and retrieval-augmented generation will become essential for allowing AI systems to handle increasingly sophisticated tasks. By investing in context engineering today, developers can position themselves to fully use the potential of AI in the future, making sure their tools remain reliable, scalable, and effective in a rapidly changing technological landscape. Media Credit: Cole Medin Filed Under: AI, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

27 Free Things You Can Do With Google Gemini AI
27 Free Things You Can Do With Google Gemini AI

Geeky Gadgets

time20 minutes ago

  • Geeky Gadgets

27 Free Things You Can Do With Google Gemini AI

What if you could unlock the power of innovative AI without spending a dime? Imagine creating a video game in minutes, generating stunning visuals from just a few words, or transforming complex data into interactive charts—all for free. Bold claim? Not with Google Gemini AI, a platform that's redefining what's possible with artificial intelligence. Whether you're a seasoned developer, a curious creator, or someone just dipping their toes into the world of AI, Gemini offers an impressive suite of tools that make advanced technology accessible to everyone. From automating tedious tasks to sparking your next big idea, it's a fantastic option for anyone looking to amplify their creativity and productivity. In this breakdown, Matt Wolfe explores 27 incredible things you can do with Google Gemini AI—all without opening your wallet. You'll discover how this platform can help you design apps, analyze videos, generate lifelike audio, and even simplify deep research. Curious about how Gemini can turn a simple prompt into a fully functional project? Or how it can guide you step-by-step through mastering complex software? The possibilities are as exciting as they are diverse. By the end, you might just find yourself wondering how you ever managed without it. Google Gemini AI Overview Game Development Creating a game from scratch has never been easier. With Google Gemini AI, you can generate code, assets, and concepts for games like 'Alien Dodger' or 'Emoji Fusion Master' in just minutes. By simply providing a prompt, the AI handles the technical aspects, allowing you to focus on the creative elements. This feature makes game development more accessible, even for beginners, and enables experienced developers to prototype ideas quickly. Whether you're designing a casual game or experimenting with innovative mechanics, Gemini streamlines the process. App Creation Building apps no longer requires extensive technical expertise. Google Gemini AI enables you to create functional applications, such as custom RSS feed readers or task management tools, by providing screenshots or detailed prompts. The AI generates and debugs code automatically, making sure your app is both efficient and user-friendly. This feature is ideal for developers looking to save time and for non-technical users seeking to bring their ideas to life. Gemini's app creation tools make it possible to turn concepts into reality with minimal effort. Interactive Tutorials Learning new software becomes significantly easier with Gemini's interactive tutorials. By sharing your screen or webcam, you can receive real-time guidance on platforms like Da Vinci Resolve or other complex tools. The AI provides step-by-step instructions and feedback based on your actions, making sure that even challenging tasks become manageable. This feature is particularly useful for individuals who prefer hands-on learning and need tailored assistance to master new skills. Whether you're editing videos or exploring design software, Gemini simplifies the learning curve. Video Analysis and Transcription Gemini's video analysis tools allow you to extract valuable insights from YouTube videos and other media. You can identify key visuals, memes, or important elements that stand out. Additionally, the transcription feature generates time-stamped text, making it easier to navigate and extract specific content. These tools are especially beneficial for researchers, marketers, and content creators who need to analyze video content efficiently. By combining analysis and transcription, Gemini enhances your ability to work with multimedia resources. Image Generation and Editing From generating whimsical images like 'a fish wearing pants' to editing existing visuals, Gemini offers robust image creation and modification capabilities. You can upload custom images for adjustments or use text-based prompts to bring your ideas to life. This feature is perfect for designers, marketers, and anyone with a creative vision who wants to experiment with visuals. Whether you're creating marketing materials or exploring artistic concepts, Gemini provides the tools to make your ideas a reality. What You Can Do With Gemini AI for Free Watch this video on YouTube. Stay informed about the latest in Google Gemini AI by exploring our other resources and articles. Text-to-Speech Gemini's text-to-speech functionality produces realistic audio with multiple speakers and customizable tones. This tool is ideal for creating podcasts, narrations, or other audio content. With high-quality results tailored to your specific needs, Gemini ensures that your audio projects sound professional and engaging. Whether you're producing content for personal or professional use, the text-to-speech feature offers a reliable solution. Data Visualization Transforming complex data into interactive visuals is effortless with Gemini. You can create maps, charts, and graphs to illustrate trends such as population changes or the growth of AI discussions. These visual aids enhance presentations and reports, making data more engaging and easier to understand. Whether you're a student, researcher, or business professional, Gemini's data visualization tools help you communicate information effectively. Deep Research Conducting in-depth research is simplified with Gemini's ability to generate detailed reports complete with source citations. You can export these reports to Google Docs for further use and collaboration. This feature is particularly valuable for academics, professionals, and anyone seeking reliable, well-organized information. By streamlining the research process, Gemini allows you to focus on analyzing and applying the insights you gather. Notebook LM Integration Gemini integrates seamlessly with Notebook LM, allowing you to upload documents, websites, or videos for analysis. The AI can generate summaries, mind maps, or even podcasts, turning complex information into digestible formats. This makes it a powerful tool for learning, presentations, and content creation. Whether you're preparing for a meeting or studying a new topic, Notebook LM integration enhances your ability to process and present information effectively. Video Generation Through its integration with Perplexity, Gemini allows you to create videos with audio for free. This feature is perfect for producing engaging content without requiring advanced video editing skills. By simplifying the video creation process, Gemini makes it accessible to a wide range of users, from content creators to educators. Whether you're creating tutorials, promotional materials, or personal projects, this tool helps you achieve professional results with minimal effort. General AI Assistance Gemini serves as a versatile assistant for everyday tasks. You can brainstorm ideas, write emails, or perform conversational tasks with ease. Additionally, the AI can generate creative content such as titles, thumbnails, or blog posts. This makes it a valuable tool for both personal and professional projects, helping you save time and enhance productivity. Whether you're managing a business or pursuing creative endeavors, Gemini provides the support you need to succeed. Free Accessibility All these features are available in the free version of Google Gemini AI. While some advanced functionalities, such as deep research, may have limited free uses per month, the platform remains a cost-effective solution for exploring AI-driven creativity, productivity, and learning. By offering a wide array of tools at no cost, Gemini ensures that advanced AI technology is accessible to everyone, regardless of their budget or expertise. Media Credit: Matt Wolfe Filed Under: AI, Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store