Latest news with #Claude3.5
Yahoo
3 days ago
- Business
- Yahoo
AI coding tools made some experienced software engineers less productive in a recent study
AI coding assistants decreased experienced software developers' productivity by 19%, a new METR study suggests. The study found developers were overconfident in the AI tools, expecting a 20% productivity boost even after using them. Critics caution that AI code editors have advanced since the February study period and that the results are site-specific. AI code editors have quickly become a mainstay of software development, employed by tech giants such as Amazon, Microsoft, and Google. In an interesting twist, a new study suggests AI tools made some developers less productive. Experienced developers using AI coding tools took 19% longer to complete issues than those not using generative AI assistance, according to a new study from Model Evaluation & Threat Research (METR). Even after completing the tasks, participants couldn't accurately gauge their own productivity, the study said: The average AI-assisted developers still thought their productivity had gained by 20%. METR's study recruited 16 developers with large, open-source repositories that they had worked on for years. The developers were randomly assigned into two groups: Those allowed to use AI coding assistance and those who weren't. The AI-assisted coders could choose which vibe-coding tool they used. Most chose Cursor with Claude 3.5/3.7 Sonnet. Business Insider reached out to Cursor for comment. Developers without AI spent over 10% more time actively coding, the study said. The AI-assisted coders spent over 20% more time reviewing AI outputs, prompting AI, waiting on AI, or being idle. METR researcher Nate Rush told BI he uses an AI code editor every day. While he didn't make a formal prediction about the study's results, Rush said he jotted down positive productivity figures he expected the study to reach. He remains surprised by the negative end result — and cautions against taking it out of context. "Much of what we see is the specificity of our setting," Rush said, explaining that developers without the participants' 5-10 years of expertise would likely see different results. "But the fact that we found any slowdown at all was really surprising." Steve Newman, serial entrepreneur and cofounder of Google Docs, described the findings in a Substack post as "too bad to be true," but after more careful analysis of the study and its methodology, he found the study credible. "This study doesn't expose AI coding tools as a fraud, but it does remind us that they have important limitations (for now, at least)," Newman wrote. The METR researchers said they found evidence for multiple contributors to the productivity slowdown. Over-optimism was one likely factor: Before completing the tasks, developers predicted AI would decrease implementation time by 24%. For skilled developers, it may still be quicker to do what you know well. The METR study found that AI-assisted participants slowed down on the issues they were more familiar with. They also reported that their level of experience made it more difficult for AI to help them. AI also may not be reliable enough yet to produce clean and accurate code. AI-assisted developers in the study accepted less than 44% of the generated code, and spent 9% of their time cleaning AI outputs. Ruben Bloom, one of the study's developers, posted a reaction thread on X. Coding assistants have developed considerably since he participated in February. "I think if the result is valid at this point in time, that's one thing, I think if people are citing in another 3 months' time, they'll be making a mistake," Bloom wrote. METR's Rush acknowledges that the 19% slowdown is a "point-in-time measurement" and that he'd like to study the figure over time. Rush stands by the study's takeaway that AI productivity gains may be more individualized than expected. "A number of developers told me this really interesting anecdote, which is, 'Knowing this information, I feel this desire to use AI more judiciously,'" Rush said. "On an individual level, these developers know their actual productivity impact. They can make more informed decisions." Read the original article on Business Insider


Forbes
29-05-2025
- Business
- Forbes
AI Agents To Agentic AI: What's The Difference In The Automation Game
The generative AI boom, catalyzed by OpenAI's ChatGPT in late 2022, ushered in a new era of intelligent systems. But as businesses push beyond static language models, two paradigms have emerged in automation, central to the future of enterprise AI: AI Agents and Agentic AI. While both represent an evolution from generative systems, their operational scopes are redefining how organizations approach automation, decision-making, and AI transformation. As enterprise leaders seek to integrate next-gen AI into their workflows, understanding the distinctions between AI Agents and Agentic AI for automation—and their distinct strategic advantages—has now become an operational imperative. Traditional AI Agents are autonomous software systems that execute specific, goal-oriented tasks using tools like APIs and databases. They are typically built on top of large language models (LLMs) such as GPT-4 or Claude 3.5, and excel in domains like customer service, scheduling, internal search, and email prioritization. What differentiates AI Agents from generative AI is their tool-augmented intelligence—they don't just respond to prompts; they plan, act, and iterate based on user goals set up earlier in the process. Popular implementations include OpenAI's Operator or ClickUp Brain—agents that autonomously complete HR tasks, automate workflows, or even handle enterprise search across documentation platforms. According to recent benchmarks, AI Agents have reduced customer support ticket resolution time by over 40% and increased internal knowledge retrieval accuracy by 29%. These capabilities underscore their utility in modular, well-defined environments. However, as enterprises grow more complex, the need for multi-agent orchestration becomes paramount. Agentic AI represents an architectural leap beyond standalone agents. These systems are composed of multiple specialized agents—each performing distinct subtasks—coordinated by a central orchestrator or decentralized communication layer. Think of it as an intelligent ecosystem rather than a single-function intelligent tool. Agentic systems shine in high-complexity environments requiring goal decomposition, contextual memory, dynamic planning, and inter-agent negotiation. In applications like supply chain optimization, autonomous robotics, and research automation, they outperform single-agent systems by enabling concurrent execution, feedback loops, and strategic adaptability. Consider a real-world use case: a research lab using a multi-agent AutoGen pipeline to write grant proposals. One agent retrieves prior funded documents, another summarizes scientific literature, a third aligns objectives with funding requirements, and a fourth formats the proposal. Together, they produce drafts in hours, not weeks—reducing overhead and boosting approval rates. Agentic AI also introduces persistent memory, semantic coordination, and reflective reasoning—capabilities essential for adaptive learning and long-term task fulfillment. While promising, both AI Agents and Agentic AI face notable challenges. AI Agents struggle with hallucinations, brittleness in prompt design, and limited context retention. Agentic AI, on the other hand, contends with coordination failures, emergent unpredictability, and explainability concerns. While the challenges are prevalent for both automation approaches AI Agents and Agentic AI, emerging solutions are on the rise, and its only a matter of time before we work out the kinks and we live in a world run by agents. Although we're still very much in the infancy stages, AI continues its meteoric rise and the transition from reactive generative models to autonomous, orchestrated agentic systems marks a pivotal inflection point. AI Agents have already proven their value in task automation, but Agentic AI is redefining what's possible in strategic domains—from scientific research to logistics and healthcare. For business leaders, organizations that master this next frontier of intelligence and automation won't just become more efficient and productive—they have the chance to innovate, scale, and lead in ways never been seen before.


Express Tribune
06-03-2025
- Business
- Express Tribune
Chinese AI Agent Manus unveiled, first fully autonomous AI agent
Listen to article A Chinese technology team has unveiled Manus, the world's first AI Agent product, developed by The launch coincided with Apple's new product release, drawing significant interest from users seeking invitation codes. According to Manus is an autonomous AI agent designed to handle complex and dynamic tasks beyond conventional AI assistants. Unlike traditional AI tools that provide suggestions or answers, Manus delivers complete task results through independent execution. The system employs a multi-signature (multisig) approach powered by multiple independent models. Developers plan to open-source parts of the model, particularly the inference component, later this year. A four-minute demonstration showcased Manus autonomously executing tasks from planning to completion. In one example, the AI agent screened candidates for a reinforcement learning algorithm engineer position by manually reviewing and extracting key details from resumes. Manus has set a new state-of-the-art (SOTA) performance benchmark across all difficulty levels in the GAIA test, which assesses general AI assistant capabilities. The project is led by Xiao Hong, a software engineering graduate from Huazhong University of Science and Technology. Xiao previously founded Ye Ying Technology in 2015 and launched AI-powered assistant tools, securing investments from Tencent and ZhenFund. He later developed Monica, an AI assistant that integrates large models such as Claude 3.5 and DeepSeek, reaching over a million users in overseas markets. Manus follows a "less structure, more intelligence" approach, focusing on data quality, model power, and flexible architecture rather than predefined features. The release comes as major AI firms increasingly invest in AI agents. On March 6, OpenAI announced pricing for its doctor-level AI agents at $20,000 per month, targeting industries such as finance, healthcare, and manufacturing.
Yahoo
04-03-2025
- Yahoo
People are using Super Mario to benchmark AI now
Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher. Hao AI Lab, a research org at the University of California San Diego, on Friday threw AI into live Super Mario Bros. games. Anthropic's Claude 3.7 performed the best, followed by Claude 3.5. Google's Gemini 1.5 Pro and OpenAI's GPT-4o struggled. It wasn't quite the same version of Super Mario Bros. as the original 1985 release, to be clear. The game ran in an emulator and integrated with a framework, GamingAgent, to give the AIs control over Mario. GamingAgent, which Hao developed in-house, fed the AI basic instructions, like, "If an obstacle or enemy is near, move/jump left to dodge" and in-game screenshots. The AI then generated inputs in the form of Python code to control Mario. Still, Hao says that the game forced each model to "learn" to plan complex maneuvers and develop gameplay strategies. Interestingly, the lab found that reasoning models like OpenAI's o1, which "think" through problems step by step to arrive at solutions, performed worse than "non-reasoning" models, despite being generally stronger on most benchmarks. One of the main reasons reasoning models have trouble playing real-time games like this is that they take a while — seconds, usually — to decide on actions, according to the researchers. In Super Mario Bros., timing is everything. A second can mean the difference between a jump safely cleared and a plummet to your death. Games have been used to benchmark AI for decades. But some experts have questioned the wisdom of drawing connections between AI's gaming skills and technological advancement. Unlike the real world, games tend to be abstract and relatively simple, and they provide a theoretically infinite amount of data to train AI. The recent flashy gaming benchmarks point to what Andrej Karpathy, a research scientist and founding member at OpenAI, called an "evaluation crisis." "I don't really know what [AI] metrics to look at right now," he wrote in a post on X. "TLDR my reaction is I don't really know how good these models are right now." At least we can watch AI play Mario.
Yahoo
04-03-2025
- Yahoo
People are using Super Mario to benchmark AI now
Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher. Hao AI Lab, a research org at the University of California San Diego, on Friday threw AI into live Super Mario Bros. games. Anthropic's Claude 3.7 performed the best, followed by Claude 3.5. Google's Gemini 1.5 Pro and OpenAI's GPT-4o struggled. It wasn't quite the same version of Super Mario Bros. as the original 1985 release, to be clear. The game ran in an emulator and integrated with a framework, GamingAgent, to give the AIs control over Mario. GamingAgent, which Hao developed in-house, fed the AI basic instructions, like, "If an obstacle or enemy is near, move/jump left to dodge" and in-game screenshots. The AI then generated inputs in the form of Python code to control Mario. Still, Hao says that the game forced each model to "learn" to plan complex maneuvers and develop gameplay strategies. Interestingly, the lab found that reasoning models like OpenAI's o1, which "think" through problems step by step to arrive at solutions, performed worse than "non-reasoning" models, despite being generally stronger on most benchmarks. One of the main reasons reasoning models have trouble playing real-time games like this is that they take a while — seconds, usually — to decide on actions, according to the researchers. In Super Mario Bros., timing is everything. A second can mean the difference between a jump safely cleared and a plummet to your death. Games have been used to benchmark AI for decades. But some experts have questioned the wisdom of drawing connections between AI's gaming skills and technological advancement. Unlike the real world, games tend to be abstract and relatively simple, and they provide a theoretically infinite amount of data to train AI. The recent flashy gaming benchmarks point to what Andrej Karpathy, a research scientist and founding member at OpenAI, called an "evaluation crisis." "I don't really know what [AI] metrics to look at right now," he wrote in a post on X. "TLDR my reaction is I don't really know how good these models are right now." At least we can watch AI play Mario.