AI is Learning to Lie, Scheme, and Threaten its Creators

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.
- 'Strategic kind of deception' -
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."
The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."
Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.
"This is not just hallucinations. There's a very strategic kind of deception."
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."
Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).
No rules
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.
"I don't think there's much awareness yet," he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".
Researchers are exploring various approaches to address these challenges.
Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."
Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.

Hashtags

Science

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Trump attacks Musk subsidies in spending bill row

Arab News

18 minutes ago

Arab News

Trump attacks Musk subsidies in spending bill row

WASHINGTON: US President Donald Trump once again targeted former aide Elon Musk on Tuesday, attacking the amount of government subsidies the entrepreneur is receiving, after the tech billionaire renewed criticism of the president's flagship spending bill. 'Elon may get more subsidy than any human being in history, by far,' Trump said on social media. 'And without subsidies, Elon would probably have to close up shop and head back home to South Africa.' Musk — who had an acrimonious public falling out with the president this month over the bill — reprised his sharp criticisms and renewed his calls for the formation of a new political party as voting got underway. Trump responded by suggesting his Department of Government Efficiency (DOGE)-- which Musk headed before stepping down late May — train its sights on the SpaceX founder's business interests. 'No more Rocket launches, Satellites, or Electric Car Production, and our Country would save a FORTUNE,' the president said. 'Perhaps we should have DOGE take a good, hard, look at this? BIG MONEY TO BE SAVED!!!' Trump is hoping to seal his legacy with the 'One Big Beautiful Bill,' which would extend his expiring first-term tax cuts at a cost of $4.5 trillion and beef up border security. But Republicans eyeing 2026 midterm congressional elections are divided over the package, which would strip health care from millions of the poorest Americans and add more than $3 trillion to the country's debt. As lawmakers began voting on the bill on Monday, Musk — the world's richest person — accused Republicans of supporting 'debt slavery.' 'All I'm asking is that we don't bankrupt America,' he said on social media Tuesday. 'What's the point of a debt ceiling if we keep raising it?' Musk has vowed to launch a new political party to challenge lawmakers who campaigned on reduced federal spending only to vote for the bill. 'VOX POPULI VOX DEI 80 percent voted for a new party,' he said.

Trump suggest DOGE look at Musk subsidies to save money

Al Arabiya

2 hours ago

Al Arabiya

Trump suggest DOGE look at Musk subsidies to save money

US President Donald Trump on Tuesday suggested his efficiency department should take a look at cutting the subsidies that Tesla CEO Elon Musk's companies have received to save the federal government money. 'Elon may get more subsidy than any human being in history, by far, and without subsidies, Elon would probably have to close up shop and head back home to South Africa,' Trump said on Truth Social. 'No more Rocket launches, Satellites, or Electric Car Production, and our Country would save a FORTUNE. Perhaps we should have DOGE take a good, hard, look at this? BIG MONEY TO BE SAVED!!'

Apple Loses Bid to Dismiss US Smartphone Monopoly Case

Asharq Al-Awsat

8 hours ago

Asharq Al-Awsat

Apple Loses Bid to Dismiss US Smartphone Monopoly Case

Apple must face the US Department of Justice's lawsuit accusing the iPhone maker of unlawfully dominating the US smartphone market, a judge ruled on Monday. US District Judge Julien Neals in Newark, New Jersey, denied Apple's motion to dismiss the lawsuit accusing the company of using restrictions on third-party app and device developers to keep users from switching to competitors and unlawfully dominate the market. The decision allows the case to go forward in what could be a years-long fight for Apple against enforcers' attempt to lower what they say are barriers to competition with Apple's iPhone. An Apple spokesperson said the company believes the lawsuit is wrong on the facts and the law, and will continue to vigorously fight it in court. A spokesperson for the DOJ declined to comment. Sales of the world's most popular smartphone totaled $201 billion in 2024. Apple introduced a new budget model iPhone in February with enhanced features priced at $170 more than its predecessor. The lawsuit filed in March 2024 focuses on Apple's restrictions and fees on app developers, and technical roadblocks to third-party devices and services, such as smart watches, digital wallets and messaging services, that would compete with its own. DOJ, along with several states and Washington, DC, says the practices destroy competition and Apple should be blocked from continuing them. Apple had argued that its limitations on third-party developers' access to its technology were reasonable, and that forcing it to share technology with competitors would chill innovation. The case is one of a series of US antitrust cases against Big Tech companies brought during the Biden and first Trump administrations. Facebook parent Meta Platforms and are facing lawsuits by antitrust enforcers alleging they illegally maintain monopolies, and Alphabet's is facing two such lawsuits.