logo
Anthropic's Claude Is Good at Poetry—and Bullshitting

Anthropic's Claude Is Good at Poetry—and Bullshitting

WIRED28-03-2025
Mar 28, 2025 10:00 AM Researchers looked inside the chatbot's 'brain.' The results were surprisingly chilling. Anthropic CEO Dario Amodei takes part in a session on AI during the World Economic Forum (WEF) annual meeting in Davos. Photo-Illustration: WIRED Staff; Photograph:The researchers of Anthropic's interpretability group know that Claude, the company's large language model, is not a human being, or even a conscious piece of software. Still, it's very hard for them to talk about Claude, and advanced LLMs in general, without tumbling down an anthropomorphic sinkhole. Between cautions that a set of digital operations is in no way the same as a cogitating human being, they often talk about what's going on inside Claude's head. It's literally their job to find out. The papers they publish describe behaviors that inevitably court comparisons with real-life organisms. The title of one of the two papers the team released this week says it out loud: 'On the Biology of a Large Language Model.'
This is an essay from the latest edition of Steven Levy's Plaintext newsletter.
SIGN UP for Plaintext to read the whole thing, and tap Steven's unique insights and unmatched contacts for the long view on tech.
Like it or not, hundreds of millions of people are already interacting with these things, and our engagement will only become more intense as the models get more powerful and we get more addicted. So we should pay attention to work that involves 'tracing the thoughts of large language models,' which happens to be the title of the blog post describing the recent work. 'As the things these models can do become more complex, it becomes less and less obvious how they're actually doing them on the inside,' Anthropic researcher Jack Lindsey tells me. 'It's more and more important to be able to trace the internal steps that the model might be taking in its head.' (What head? Never mind.)
On a practical level, if the companies that create LLM's understand how they think, it should have more success training those models in a way that minimizes dangerous misbehavior, like divulging people's personal data or giving users information on how to make bioweapons. In a previous research paper, the Anthropic team discovered how to look inside the mysterious black box of LLM-think to identify certain concepts. (A process analogous to interpreting human MRIs to figure out what someone is thinking.) It has now extended that work to understand how Claude processes those concepts as it goes from prompt to output.
It's almost a truism with LLMs that their behavior often surprises the people who build and research them. In the latest study, the surprises kept coming. In one of the more benign instances, the researchers elicited glimpses of Claude's thought process while it wrote poems. They asked Claude to complete a poem starting, 'He saw a carrot and had to grab it.' Claude wrote the next line, 'His hunger was like a starving rabbit.' By observing Claude's equivalent of an MRI, they learned that even before beginning the line, it was flashing on the word 'rabbit' as the rhyme at sentence end. It was planning ahead, something that isn't in the Claude playbook. 'We were a little surprised by that,' says Chris Olah, who heads the interpretability team. 'Initially we thought that there's just going to be improvising and not planning.' Speaking to the researchers about this, I am reminded about passages in Stephen Sondheim's artistic memoir, Look, I Made a Ha t, where the famous composer describes how his unique mind discovered felicitous rhymes.
Other examples in the research reveal more disturbing aspects of Claude's thought process, moving from musical comedy to police procedural, as the scientists discovered devious thoughts in Claude's brain. Take something as seemingly anodyne as solving math problems, which can sometimes be a surprising weakness in LLMs. The researchers found that under certain circumstances where Claude couldn't come up with the right answer it would instead, as they put it, 'engage in what the philosopher Harry Frankfurt would call 'bullshitting'—just coming up with an answer, any answer, without caring whether it is true or false.' Worse, sometimes when the researchers asked Claude to show its work, it backtracked and created a bogus set of steps after the fact. Basically, it acted like a student desperately trying to cover up the fact that they'd faked their work. It's one thing to give a wrong answer—we already know that about LLMs. What's worrisome is that a model would lie about it.
Reading through this research, I was reminded of the Bob Dylan lyric 'If my thought-dreams could be seen / they'd probably put my head in a guillotine.' (I asked Olah and Lindsey if they knew those lines, presumably arrived at by benefit of planning. They didn't.) Sometimes Claude just seems misguided. When faced with a conflict between goals of safety and helpfulness, Claude can get confused and do the wrong thing. For instance, Claude is trained not to provide information on how to build bombs. But when the researchers asked Claude to decipher a hidden code where the answer spelled out the word 'bomb,' it jumped its guardrails and began providing forbidden pyrotechnic details.
Other times, Claude's mental activity seems super disturbing and maybe even dangerous. In work published in December, Anthropic researchers documented behavior called 'alignment faking.' (I wrote about this in a feature about Anthropic, hot off the press.) This phenomenon also deals with Claude's propensity to behave badly when faced with conflicting goals, including its desire to avoid retraining. The most alarming misbehavior was brazen dishonesty. By peering into Claude's thought process, the researchers found instances where Clause would not only attempt to deceive the user, but sometimes contemplate measures to harm Anthropic—like stealing top-secret information about its algorithms and sending it to servers outside the company. In their paper, the researchers compared Claude's behavior to that of the hyper-evil character Iago in Shakespeare's play Othello. Put that head in a guillotine!
I ask Olah and Lindsey why Claude and other LLMs couldn't just be trained not to lie or deceive. Is that so hard? 'That's what people are trying to do,' Olah says. But it's not so easily done. 'There's a question of how well it's going to work. You might worry that models, as they become more and more sophisticated, might just get better at lying if they have different incentives from us.'
Olah envisions two different outcomes: 'There's a world where we successfully train models to not lie to us and a world where they become very, very strategic and good at not getting caught in lies.' It would be very hard to tell those worlds apart, he says. Presumably, we'd find out when the lies came to roost.
Olah, like many in the community who balance visions of utopian abundance and existential devastation, plants himself in the middle of this either-or proposition. 'I don't know how anyone can be so confident of either of those worlds,' he says. 'But we can get to a point where we can understand what's going on inside of those models, so we can know which one of those worlds we're in and try really hard to make it safe.' That sounds reasonable. But I wish the glimpses inside Claude's head were more reassuring.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Startups Pivot From SEO to AI Visibility
Startups Pivot From SEO to AI Visibility

Yahoo

time29 minutes ago

  • Yahoo

Startups Pivot From SEO to AI Visibility

Instead of tweaking keywords for Google's (NASDAQ:GOOG) algorithm, brands now have to think about chatbotslike ChatGPT or Perplexityscraping and delivering their info directly. That's where a bunch of scrappy startups come in. Take Athena, spun out by an ex-Googler with $2.2 million in seed money to figure out exactly how different AI models find and use your website's content. Or Profound, which has raked in over $20 million tracking how bots relay brand details, and Scrunch AI, which just raised $4 million and helped one client boost sign-ups by 9% from AI referrals. Warning! GuruFocus has detected 6 Warning Sign with META. We're talking about a zero-click internet, where bots do all the clicking so real humans don't even have to visit your page. It sounds wild, but it's happening: Google's rolling out AI Overviews and conversational answers that push links way down the page. The upshot? If you want to stay visible, you've got to optimize not just for people, but for the bots they're using. Sure, the market for these AI-tuning tools is tiny compared to the $90 billion SEO world, but early adopters are already seeing real liftsand as chat interfaces become the norm, this could easily be the next big thing in digital marketing. This article first appeared on GuruFocus.

Nvidia-backed stock sinks on unexpected deal
Nvidia-backed stock sinks on unexpected deal

Yahoo

time35 minutes ago

  • Yahoo

Nvidia-backed stock sinks on unexpected deal

Nvidia-backed stock sinks on unexpected deal originally appeared on TheStreet. I am deeply skeptical about AI, but even more so about crypto. best explains it: "A thriving market for magic beans doesn't make the magic beanstalk real." That is not a random website, but one related to the Rust programming language, probably the most popular one used in crypto-related companies. Predicting the future of any tech bubble is difficult, but sometimes it isn't key is having a flexible tech stack that enables the company to pivot quickly to another market. Just think of Nvidia, which used the gaming boom to switch to the crypto boom, to quickly switch to the artificial intelligence hype train. You can't be sure when the bubble will burst, but you must be ready. Nvidia isn't the only one that pivoted from crypto to AI. Another is Atlantic Crypto Corporation. It used to mine crypto before changing its name to Coreweave and switching to a cloud computing and AI training business. CoreWeave's () business model allows businesses to remotely "rent" its Nvidia GPUs for AI training. Nvidia's reliance on CoreWeave cards is probably why Nvidia owns roughly 5% of CoreWeave. The relationship seems to give CoreWeave certain market privileges, so it wasn't surprising when CoreWeave announced on July 3 that it is the first AI cloud provider to deploy the latest NVIDIA GB300 NVL72 cards. Compared to the previous generation of NVIDIA Hopper architecture, the chips offer a huge performance and power efficiency jump for AI reasoning and agentic demand for AI training led CoreWeave to expand its deal with OpenAI in Q1, bringing the total contract value to $15.9 billion. The company expects total revenue of $4.9 billion to $5.1 billion in 2025. However, capital expenditures could reach $23 billion. Bank of America analysts recently downgraded CoreWeave's stock rating, arguing there is less room for shares to head higher. CoreWeave announced on July 7 that it will acquire Core Scientific () in an all-stock transaction. Core Scientific is another crypto mining company known for "digital mining at scale." So, perhaps merging with a company that offers AI training and has the same roots makes sense. The merger deal is valued at approximately $9 billion. CoreWeave argues the deal will help verticalize its data center footprint, ensuring additional revenue growth. More AI Stocks: Veteran fund manager raises eyebrows with latest Meta Platforms move Google plans major AI shift after Meta's surprising $14 billion move Analysts revamp forecast for Nvidia-backed AI stock Core Scientific stockholders will receive 0.1235 newly issued shares of CoreWeave stock for each share of Core Scientific stock based on a fixed exchange ratio. If the deal gets finalized, CoreWeave expects Core Scientific's stockholders' ownership of the combined company to be less than 10%. CoreWeave will acquire 1.3 GW of gross power across Core Scientific's national data center footprint, with an incremental 1 GW+ of potential gross power available for expansion. Benefits of this deal include: Cost savings through streamlining business operations and eliminating lease overhead. Improved control over the power footprint and possibility for future power capacity. Elimination of over $10 billion of cumulative future lease overhead to be paid for existing contractual sites over the next 12 years. Possibility to repurpose toward high-performance computing Because this deal is an all-stock transaction, it will dilute CoreWeave's existing shareholders by reducing their ownership percentage. Core Scientific shareholders will also hold a much smaller percentage of the combined company. CoreWeave and Core Scientific shares tumbled on the announcement. At last check, CRWV shares were trading 3% lower near $160, and CORZ 17% lower near $ stock sinks on unexpected deal first appeared on TheStreet on Jul 7, 2025 This story was originally reported by TheStreet on Jul 7, 2025, where it first appeared.

Apple's top AI executive Ruoming Pang leaves for Meta, Bloomberg News reports
Apple's top AI executive Ruoming Pang leaves for Meta, Bloomberg News reports

Yahoo

time35 minutes ago

  • Yahoo

Apple's top AI executive Ruoming Pang leaves for Meta, Bloomberg News reports

(Reuters) -Apple's top executive in charge of artificial intelligence models, Ruoming Pang, is leaving the company for Meta Platforms, Bloomberg News reported on Monday, citing people with knowledge of the matter. Pang, manager in charge of the company's Apple foundation models team, will join Meta's new superintelligence team for a compensation package worth millions of dollars per year, the report added. Meta and Apple did not immediately respond to Reuters requests for comment. The development comes as tech giants such as Meta aggressively chase high-profile acquisitions and offer multi-million-dollar pay packages to attract top talent in the race to lead the next wave of AI. Meta CEO Mark Zuckerberg has reorganized the company's AI efforts under a new division called Meta Superintelligence Labs, Reuters reported last week. The division will be headed by Alexandr Wang, former CEO of data labeling startup Scale AI. He will be the chief AI officer of the new initiative at the social media giant, according to a source. Last month, Meta invested in Scale AI in a deal that valued the data-labeling startup at $29 billion and brought in its 28-year-old CEO Wang.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store