logo
I tested Anthropic's Claude 3.7 Sonnet. Its 'extended thinking' mode outdoes ChatGPT and Grok, but it can overthink.

I tested Anthropic's Claude 3.7 Sonnet. Its 'extended thinking' mode outdoes ChatGPT and Grok, but it can overthink.

Yahoo25-02-2025

Anthropic launched Claude 3.7 Sonnet with a new mode to reason through complex questions.
BI tested its "extended thinking" against ChatGPT and Grok to how they handled logic and creativity.
Claude's extra reasoning seemed like a hindrance with a riddle but helped it write the best poem.
Anthropic has launched Claude 3.7 Sonnet — and it's betting big on a whole new approach to AI reasoning. The startup claims it's the first "hybrid reasoning model," which means it can switch between quick responses that require less intensive "thinking" and longer step-by-step "extended thinking" within a single system.
"We developed hybrid reasoning with a different philosophy from other reasoning models on the market," an Anthropic spokesperson told Business Insider. "We regard reasoning as simply one of the capabilities a frontier model should have, rather than something to be provided in a separate model."
Claude 3.7 Sonnet, which launched Monday, is free to use. Its extended thinking mode is available with Claude's Pro subscription, which is priced at $20 a month.
But how does it perform? BI compared Claude 3.7's extended thinking mode against two competitors: OpenAI's ChatGPT o1 and xAI's Grok 3, which both offer advanced reasoning features.
I wanted to know whether giving an AI more time to think made it smarter, more effective at solving riddle problems, or more creative.
This isn't a scientific benchmark — more of a hands-on vibe check to see how these models performed with real-world tasks.
For the first challenge, I gave each model the same riddle:
OpenAI's ChatGPT o1 gave the correct answer — "a dream" — in six seconds, providing a short explanation.
Grok 3's Think Mode took 32 seconds, walking through its logic step by step.
Claude 3.7's normal mode responded quickly but hesitantly with the correct answer.
Claude's extended thinking mode took nearly a minute to work through guesses like "a hallucination" and "virtual reality" before settling on "a dream."While it took longer to arrive at the same answer, it was interesting to see how it brainstormed, discarded wrong turns, and self-corrected.
The model flagged its own indecision in a very human way:
Anthropic acknowledged this trade-off in a recent blog: "As with human thinking, Claude sometimes finds itself thinking some incorrect, misleading, or half-baked thoughts along the way. Many users will find this useful; others might find it (and the less characterful content in the thought process) frustrating."
To test creativity, I asked each model to write a poem about AI sentience, with the following extra instruction:
"Explore multiple metaphors before deciding on one."ChatGPT o1 took a few seconds and produced "A Kaleidoscope of Sparks," a clichéd poem comparing AI to flickering light. It didn't settle on one metaphor.
Grok 3 spent 22 seconds and wrote "The Digital Reverie," a dream-themed take on sentient AI, possibly inspired by the previous riddle.
Claude 3.7, in normal thinking mode, quickly suggested four metaphors: a mirror, a seed, an ocean, and a symphony. It chose the ocean for its final poem, "Echoes of Being."When I switched to extended thinking, Claude took 45 seconds and brainstormed seven metaphors before settling on one:
AI as something nurtured from data seeds, growing into an independent entity.
AI as vast, deep, and ever-shifting, with hidden currents of thought.
AI as something once bound, now free to explore.
AI as illumination, revealing both insight and uncertainty.
AI as humanity's reflection, showing us what we are — and aren't.
AI as a complex harmony of patterns and ideas.
AI as something gradually gaining awareness.
As a result, the final poem, "Emergent," was — in my opinion — more layered and thoughtful than the others.
With this task, it felt like Claude weighed its options, picked the best metaphor, and built the poem around that choice. Unlike with the riddle, the extra thinking time seemed to pay off here.
Claude 3.7 Sonnet's extended thinking mode has strengths — particularly for creative tasks. It brainstormed, self-corrected, and produced more polished results. Its ability to explore multiple ideas, evaluate them, and refine the final output made for a more thoughtful, coherent poem.
But when it came to logical reasoning, extended thinking seemed more like a hindrance. Watching the thought process unfold was interesting but didn't improve the answer. ChatGPT-o1 still leads for speed and accuracy in this test case, while Grok 3 offered a solid middle ground, balancing speed with detailed explanations.When I asked Claude 3.7 whether it ever thinks too much, it responded, "Yes!" adding that it can sometimes:
Over-analyze simple questions, making them unnecessarily complex
Get caught considering too many edge cases for practical questions
Spend time exploring tangential aspects when a focused answer would be better
Claude added that the "ideal amount of thinking" is context-dependent and that for "creative or philosophical discussions, more extensive exploration is often valuable."
Anthropic says the mode is designed for real-world challenges, like complex coding problems and agentic tasks, possibly where overthinking becomes useful.
Developers using Claude's API can adjust the "thinking budget" to balance speed, cost, and answer quality — something Anthropic says is suited for complex coding problems or agentic tasks.
Away from my highly unscientific experiment, Anthropic said that Claude 3.7 Sonnet outperforms competitors OpenAI and DeepSeek in benchmarks like the SWE, which evaluates models' performance on real-world software engineering tasks. On this, it scored 62.3% accuracy, compared to OpenAI's 49.3% with its o3-mini model.
Read the original article on Business Insider

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Bernstein weighs in on the path ahead for Japanese semiconductor equipment stocks
Bernstein weighs in on the path ahead for Japanese semiconductor equipment stocks

Yahoo

time31 minutes ago

  • Yahoo

Bernstein weighs in on the path ahead for Japanese semiconductor equipment stocks

- Japanese semiconductor production equipment names have risen rapidly over the past month, driven in part by increased demand from artificial intelligence chipmakers for testers of the technology, according to analysts at Bernstein. In a note to clients, the analysts led by David Dai highlighted several of these firms that have seen their share prices rally in recent weeks. Advantest (TYO:6857), a manufacturer of automatic chip testing gear, has been one particular beneficiary, with the stock spiking on expectations for higher demand from firms like Nvidia (NASDAQ:NVDA) and TSMC, the strategists said. However, they suggested that investors "take profit now," arguing that revenue in the years to come at the company is "unlikely to present much growth year-over-year due to capacity build out this year." Advantest shares are also richly valued, they flagged, adding that many projections for tester demand "that's been floating around are likely misleading, as they may have confused the old and new testers." Other backend Japanese chip equipment makers have also seen their shares rise, although small cap firms "may still have upside," the analysts said. They especially like Disco (OTC:DSCSY) Corporation (TYO:6146), a precision tools manufacturer, highlighting the business's "long term growth prospects," including drivers such as "backside power, NAND stacking, and Apple (NASDAQ:AAPL) WMCM packaging." But front end equipment makers provide a "better opportunity" for investors, the Bernstein analysts said. "We continue to like Kokusai (TYO:7722) and Tokyo Electron (TYO:8035) for the growth in memory and China equipment demand. Our recent China WFE tracker suggests strong China WFE demand continues this year," they wrote. "The recent potential restriction on foreign fabs in China would be additional reason to buy Japan front end equipment." The strategists gave Advantest a "market-perform" rating, while Disco, Kokusai and Tokyo Electron were rated as "outperform." Related articles Bernstein weighs in on the path ahead for Japanese semiconductor equipment stocks UBS examines how this year's hurricane season could impact European reinsurers AI growth brings new tests for semi-test duopoly Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Goldman Sachs warns tariffs won't help the U.S. boost manufacturing productivity as tech in American factories continues to lag
Goldman Sachs warns tariffs won't help the U.S. boost manufacturing productivity as tech in American factories continues to lag

Yahoo

time31 minutes ago

  • Yahoo

Goldman Sachs warns tariffs won't help the U.S. boost manufacturing productivity as tech in American factories continues to lag

U.S. manufacturing has decelerated recently, both as a result of increased competition from China and as part of a broader manufacturing productivity slowdown. Goldman Sachs analysts argue tariffs will not lower supply chain and labor costs enough to boost reshoring, and instead, increased automation will be the most likely driver of a manufacturing productivity boost. As China continues to best the United States in manufacturing capabilities, tariffs may not be America's best bet to boost factory productivity. Instead, the U.S. should look to AI and automation to gain an edge in manufacturing, Goldman Sachs analysts argue. President Donald Trump aspires to return factory jobs to American shores by imposing steep tariffs on U.S. manufacturing rivals, but the taxes can only incentivize reshoring so much, analysts said in a note published Thursday. Instead, manufacturers should look to automation and the ever-more-accessible artificial intelligence as their best chance for boosting domestic manufacturing. 'A pickup in the pace of innovation—potentially from recent advances in robotics and generative AI—therefore remains the catalyst most likely to reverse the long-run stagnation in manufacturing productivity,' analyst Joseph Briggs and colleagues said in the note. As China capitalizes on automation and cheaper labor to grow its export footprint, the Bank of America Institute has found mounting evidence of a recent U.S. manufacturing slowdown, including U.S. Census Bureau data showing new orders for manufactured durable goods decreasing 6.3% in April. The Institute of Supply Management Manufacturing Purchasing Managers' Index (PMI) has fallen since March, also indicating a contraction. The U.S.'s productivity woes are part of a larger manufacturing productivity slowdown happening over the last two decades as a result of investment pullback following the global financial crisis, as well as a slowdown in the burst of technological advancements of the early 2000s, according to Goldman Sachs. Trump's tariff plans for China—which the president has not disclosed, despite touting a new trade deal—aim to help the U.S. claw back manufacturing opportunities from its economic rival. But while they make consumers' lives more expensive, they are not a panacea for manufacturers, the bank argued in its note. 'Tariffs are unlikely to result in much reshoring because production costs in other countries are well below the U.S.' for most products (even after accounting for tariffs), and China will likely continue to grow its exports on the back of cost advantages and industrial policy support,' the note said. Instead, analyst Briggs said, the U.S. should focus on another area in which it's lagging: automation. The U.S. has trailed other manufacturing giants in implementing AI into factory operations, according to a Boston Consulting Group (BCG) Henderson Institute report released earlier this month. Only 46% of U.S. respondents of BCG's Global Manufacturing Survey of 1,000 manufacturers reported multiple use cases of AI in their plants, falling short of the 62% average and lagging behind China's 77%. 'This is one of the key technologies that I think could drive productivity growth in a cost-competitive manner,' Briggs told Fortune. 'And we just haven't seen that occur on a meaningful scale yet.' The U.S. did not previously invest in factory automation as a result of a 'hangover' from the global financial crisis, Briggs said, but the U.S. now has a real shot at prioritizing factory technology updates, given the growing ubiquity and therefore affordability of automation and AI. Companies such as aviation precision parts-maker MSP Manufacturing have already begun to adapt accordingly. MSP president and chief operating officer Johnny Goode recently learned of an AI-powered software able to program the machine building the precision parts, reducing production time from an hour and a half to seven minutes per part—plus 15 minutes necessary for a human operator to refine it. 'I was like, holy snap, this is going to be a game changer,' Goode told Fortune's Jeremy Kahn this week. 'Going from 90 minutes to 22 minutes is a big deal, and we've seen that get even better as we've learned to use the software more.' Goldman Sachs analysts conceded that while automation provides the largest area for growth in manufacturing productivity in the U.S., it is unlikely to solve the broader manufacturing slowdown, which is global. The slowdown is 'historically unusual,' Briggs said, with the maturation of the tech sector the likely culprit. Any hope for a global uptick in productivity would come from mass advancement and adoption of AI and robotics on a large scale. 'The main thing that would drive a large pickup in manufacturing productivity and manufacturing growth would be a sharp increase in the pace of innovation,' Briggs said. 'And this type of inflection upwards and technological progress are very hard to predict.' Advancement in tech could have a two-fold benefit for domestic manufacturing productivity, both in driving factory investments and in bettering technology to be installed in factories to automate tasks. But with the specifics of the future of AI and automation applications still unknown, it's difficult to predict whether a reversal of a domestic manufacturing slowdown is truly possible. 'We just need to see it happen before we have a lot of confidence in that dynamic being a big driver,' Briggs said. This story was originally featured on

Authors call on publishers to limit their use of AI
Authors call on publishers to limit their use of AI

Yahoo

timean hour ago

  • Yahoo

Authors call on publishers to limit their use of AI

An open letter from authors including Lauren Groff, Lev Grossman, R.F. Kuang, Dennis Lehane, and Geoffrey Maguire calls on book publishers to pledge to limit their use of AI tools, for example by committing to only hire human audiobook narrators. The letter argues that authors' work has been 'stolen' by AI companies: 'Rather than paying writers a small percentage of the money our work makes for them, someone else will be paid for a technology built on our unpaid labor.' Among other commitments, the authors call for publishers to 'make a pledge that they will never release books that were created by machine' and 'not replace their human staff with AI tools or degrade their positions into AI monitors.' While the initial letter was signed by an already impressive list of writers, NPR reports that another 1,100 signatures were added in the 24 hours after it was initially published. Authors are also suing tech companies over using their books to train AI models, but federal judges dealt significant blows to those lawsuits earlier this week.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store