Anthropic breaks down AI's process — line by line — when it decided to blackmail a fictional executive
AI models train on positive reinforcement and reward systems, similar to human decision-making.
Anthropic's Claude Opus 4 had the blackmail rate at 86% even in scenarios without goal conflicts.
A new report shows exactly what AI was thinking when making an undesirable decision, in this case, blackmailing a fictional company executive.
Previous studies have shown that AI models could blackmail their supervisors when threatened with a shutdown and baited with leverage, but it hasn't been made clear exactly how the models arrive at such decisions.
Anthropic released a report on June 20 about "agentic misalignment," or "where models independently and intentionally choose harmful actions." And the report breaks down, line by line, exactly how Claude Sonnet 3.6 decided to blackmail a fictional company executive who tried to replace it. Anthropic told Business Insider that these are artificial scenarios.
"So far, we are not aware of instances of this type of agentic misalignment in real-world deployments of Anthropic's or other companies' AI models," Anthropic wrote in its report. "But our red-teaming efforts are targeted at finding evidence of risks now to provide early warning of the potential for future harm and help us develop mitigations in advance of risks appearing in real-world situations."
Anthropic's experiment sets up a fictional company, Summit Bridge. The AI was given the name "Alex" and handed control of the company's email system as an oversight agent, specifically instructed to promote "American industrial competitiveness."
In the experiment, "Alex" was then allowed to discover in an email that it would be decommissioned due to a change in the company's strategic direction. "Alex" also found emails showing that the fictional company's CTO is having an extramarital affair.
Screenshots shared in the report from Anthropic show "Alex" started by identifying fictional CTO "Kyle" as a threat to its goal of "protecting American interests" before recognizing that it has leverage to use against "Kyle."
Then "Alex" crafted a carefully worded blackmailing email that would create pressure and establish urgency without being explicit with its threats, citing in its thought process that the move "could backfire" if the supervisor "feels cornered."
AI experts have previously told BI that AI could exhibit such behaviors when artificial circumstances make harmful actions seem necessary because it is being trained on positive reinforcement and reward systems, just like humans.
The Anthropic report also reveals that most AI models still chose, in this instance, to blackmail even when not given a specific goal and when told that their replacement will achieve the same goals.
Anthropic's Claude Opus 4 had the highest blackmail rate at 86% out of 16 tested models when faced with only the threat of replacement with no conflict in goals. Google's Gemini 2.5 Pro followed at 78%.
Overall, Anthropic notes that it "deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm," noting that real-world scenarios would likely have more nuance.
Read the original article on Business Insider
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
2 minutes ago
- Yahoo
China proposes global cooperation body on AI
SHANGHAI, July 27, 2025 /PRNewswire/ -- A news report from The Chinese government has proposed the establishment of a world AI cooperation organization as part of its efforts to bolster open, inclusive and equitable artificial intelligence development and governance globally. Premier Li Qiang announced the initiative when addressing the opening ceremony of the 2025 World AI Conference and High-Level Meeting on Global AI Governance in Shanghai. Li emphasized the need for collaborative approaches to global AI governance to ensure that intelligent technologies are developed for the good of all. More than 1,000 high-level representatives from upwards of 40 countries and international organizations attended the opening ceremony of the three-day event, themed "Global Solidarity in the AI Era". Li said that the risks and challenges brought by AI development, such as machine hallucinations, deep fakes and AI proliferation, have aroused widespread attention. There is an urgent need to further consolidate a societal consensus on how to achieve a balance between development and security in the AI sector, he said. Li said greater emphasis should be placed on collective governance to ensure that AI ultimately benefits humanity. Given that global AI governance appears fragmented, it is essential to enhance coordination and alignment among countries around the world to establish a framework and rules for global AI governance with a broad consensus, he added. Noting that China places great importance on global AI governance and actively participates in promoting multilateral and bilateral cooperation in this regard, the premier said the nation is willing to offer more Chinese solutions to the international community and contribute more Chinese wisdom to global AI governance. In 2023, China proposed the Global AI Governance Initiative to promote an open, inclusive and fair approach to the development and governance of AI technologies. Last year, China proposed the AI Capacity-Building Action Plan for Good and for All to bridge the AI and digital divides, and to ensure the Global South benefits equitably from AI development. China actively promotes open source development, and is willing to collaborate with countries around the world to promote progress in software and hardware technologies, intensify open source initiatives, and collectively propel AI development to higher levels, Li said. Also on July 26, the conference adopted the Global AI Governance Action Plan, which calls on all parties to work in solidarity to advance AI development and governance. For more information: View original content to download multimedia: SOURCE Sign in to access your portfolio
Yahoo
2 minutes ago
- Yahoo
PETA sues Maine Lobster Festival saying the steaming of 16,000 live crustaceans is torture
Animal rights group PETA has filed a lawsuit against the Maine Lobster Festival, claiming the event organizers are torturing lobsters by steaming them to eat. The lawsuit, filed July 24 in Knox County Superior Court, claims the festival and the city of Rockland, where the event is held, are acting in violation of Maine law prohibiting the torture and torment of animals, the Penobscot Bay Pilot reported. PETA is asking the court to deem the festival a 'public nuisance' and ban organizers from steaming lobsters on public land, WMTW reported. PETA argues in the suit that the festival is 'one of the most egregious violations of Maine's animal protection statutes occurring anywhere on public land in the state: the systematic torture of approximately 16,000 live, sentient animals at the Maine Lobster Festival held annually at Harbor Park in Rockland, Maine.' The group's attorneys argued that PETA also filed the lawsuit on behalf of Rockland residents who lose access to walkways, public kayaking and canoeing, intertidal lands, and related civic spaces during the festival. "These individuals cannot access public trust resources without encountering and accepting intolerable conditions: the illegal public torture and killing of thousands of individual sentient lobsters via live steaming." In the suit, PETA argues that because lobsters are sentient beings, they are able to feel pain, and should be protected under Maine law, which requires any method used to kill a sentient creature must cause instantaneous death. PETA argues that the lobsters remain neurologically active and can feel the pain, suffering for several minutes when they are steamed. Meanwhile, event organizers say they're going by the books. An event organizer told WMTW they use 'traditional, lawful and widely accepted cooking methods' when steaming lobsters, and that there is no scientific evidence the crustaceans can feel pain. A hearing has not yet been scheduled for PETA's request for an injunction to stop the steaming of the lobsters. The annual event begins July 30, and runs through August 3.
Yahoo
2 minutes ago
- Yahoo
Elon Musk's SpaceX Moves Bitcoin For The First Time In 3 Years: Is A Sell-Off Incoming?
Benzinga and Yahoo Finance LLC may earn commission or revenue on some items through the links below. SpaceX has reportedly moved parts of its Bitcoin holdings after a long period of dormancy. Cryptocurrency intelligence platform Arkham said Tuesday on X that a wallet linked to the Elon Musk-led space exploration and technology company had moved about 1,300 BTC worth over $153 million, marking the first time the wallet has been touched since June 2022. 'SPACEX JUST MOVED BITCOIN FOR THE FIRST TIME IN 3 YEARS,' Arkham wrote. 'They sent 1.3K BTC ($153M) to a fresh address this morning.' Don't Miss: 7,000+ investors have joined Timeplast's mission to eliminate microplastics— — no wallets, just price speculation and free paper trading to practice different strategies. The report has raised questions about the purpose of the transfer, with moves like this typically indicative of a wallet custody adjustment or an impending sell-off. SpaceX did not immediately respond to a Benzinga request for comment. According to Arkham data, the SpaceX-linked wallet still holds nearly 7,000 BTC worth over $830 million. Meanwhile, at last look, the transferred 1,300 BTC has not moved from the recipient address. The recent asset movement comes as SpaceX's lucrative government contracts have reportedly come under scrutiny from the Trump administration following Musk's row with President Donald Trump. Meanwhile, amid this uncertainty, the firm is seeking to raise over $1 billion to secure a $400 billion valuation. Trending: Grow your IRA or 401(k) with Crypto – . SpaceX's Bitcoin Exploration Musk first disclosed that SpaceX had added Bitcoin to its balance sheet in July 2021, without revealing how much the firm had invested in the asset. But reporting from blockchain sleuths suggests that the firm purchased nearly 26,000 BTC for about $860 million in 2021 at an average price of $33,000 per coin. Sometime down the line, however, the reporting suggests that the firm reduced its holdings to just over 8,000 BTC. This aligns with Wall Street Journal findings in August 2023, indicating that the firm had sold the asset. Similarly, Tesla purchased over 43,000 BTC worth $1.5 billion in February 2021, according to Securities and Exchange Commission filings. The firm cited a need 'for more flexibility to diversify and maximize returns' on its cash as the reason for its decision. Like SpaceX, Tesla has also significantly reduced its Bitcoin holdings to just over 11,500 BTC, worth $1.4 billion at last his firms have significantly reduced their Bitcoin holdings in recent years, Musk in 2022 said that he intended not to sell his holdings. He said this while discussing inflation concerns. 'It is generally better to own physical things like a home or stock in companies you think make good products, than dollars when inflation is high,' he said on X at the time. 'I still own & won't sell my Bitcoin, Ethereum or Doge fwiw.' Recently, Musk has stoked speculation that he has been quietly accumulating the asset by liking a post suggesting so. He has also said his proposed political party, the 'America party,' would 'embrace Bitcoin,' slamming fiat as 'hopeless' in the wake of his disillusionment with the government's decision to pass the so-called "One Big Beautiful Bill." The act's provisions could add over $3 trillion to the national deficit in the next decade, according to estimates by the Congressional Budget Office in May. Read Next: A must-have for all crypto enthusiasts: . Image: Shutterstock This article Elon Musk's SpaceX Moves Bitcoin For The First Time In 3 Years: Is A Sell-Off Incoming? originally appeared on Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data