AI is learning to lie, scheme, and threaten its creators

The world's most advanced AI models are exhibiting troubling new behaviours — lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behaviour appears linked to the emergence of "reasoning" models — AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.
'Strategic kind of deception'
For now, this deceptive behaviour only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."
The concerning behaviour goes far beyond typical AI "hallucinations" or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."
Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.
"This is not just hallucinations. There's a very strategic kind of deception."
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."
Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).
- No rules -
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.
"I don't think there's much awareness yet," he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".
Researchers are exploring various approaches to address these challenges.
Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."
Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.

Hashtags

Science

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Trump says 'very wealthy' group found to buy TikTok

Khaleej Times

11 hours ago

Khaleej Times

Trump says 'very wealthy' group found to buy TikTok

President Donald Trump said Sunday a group of buyers had been found for TikTok, which faces a looming ban in the United States due to its China ties, adding he could name the purchasers in two weeks. "We have a buyer for TikTok, by the way," Trump said in an interview on Fox's Sunday Morning Futures with Maria Bartiromo. "Very wealthy people. It's a group of wealthy people," the president said, without revealing more except to say he would make their identities known "in about two weeks." The president also said he would likely need "China approval" for the sale, "and I think President Xi (Jinping) will probably do it." TikTok is owned by China-based internet company ByteDance. A federal law requiring TikTok's sale or ban on national security grounds was due to take effect the day before Trump's inauguration on January 20. But the Republican, whose 2024 election campaign relied heavily on social media and who has said he is fond of TikTok, put the ban on pause. In mid-June Trump extended a deadline for the popular video-sharing app by another 90 days to find a non-Chinese buyer or be banned in the United States. Tech experts quickly described the TikTok kerfuffle as a symbol of the heated US-China tech rivalry. While Trump had long supported a ban or divestment, he reversed his position and vowed to defend the platform -- which boasts almost two billion global users -- after coming to believe it helped him win young voters' support in the November election. "I have a little warm spot in my heart for TikTok," Trump told NBC News in early May. "If it needs an extension, I would be willing to give it an extension." Now after two extensions pushed the deadline to June 19, Trump has extended it for a third time. He said in May that a group of purchasers was ready to pay ByteDance "a lot of money" for TikTok's US operations. The previous month he said China would have agreed to a deal on the sale of TikTok if it were not for a dispute over Trump's tariffs on Beijing. ByteDance has confirmed talks with the US government, saying key matters needed to be resolved and that any deal would be "subject to approval under Chinese law."

UAE employees outpace EMEA peers in cyber confidence, study reveals

Khaleej Times

11 hours ago

Khaleej Times

UAE employees outpace EMEA peers in cyber confidence, study reveals

The UAE workforce is ahead of its EMEA peers across several indicators of cyber-readiness, underscoring the country's progress toward its national vision for digital resilience and AI-enabled defence, a study showed. According to research by Cohesity, a company specialising in AI-powered data security and resilience, 86 per cent of UAE employees expressed confidence in recognising a cyber threat—compared to 81 per cent in the UK, 80 per cent in Germany, and just 62 per cent in France. Nearly nine in ten (89 per cent) UAE respondents also said they trust their organisation's ability to prevent and recover from attacks. Beyond awareness, the study reveals encouraging signs of action-oriented behaviour. Two-thirds of UAE employees say they would report suspicious activity to their cybersecurity team, showing an apt response, in comparison to respondents from the UK (61 per cent), Germany (53 per cent), and France (48 per cent). Amongst other UAE employees, over half would notify their IT department. This instinct to act is supported by ongoing education: 66 per cent have received some form of cybersecurity training in the past year. However, the research also highlights areas where further progress is needed. A small but notable group of employees say they would either attempt to resolve a threat on their own (15 per cent) or turn to personal contacts first (19 per cent), indicating a gap in internal reporting clarity, and a potentially risk to the entire organisation that mis-understanding of how important it is that reporting through the correct processes is critical to the quickest resolution of any potential risk of cyber attack . Among those hesitant to report incidents correctly, the leading reasons include fear of blame or confusion (46 per cent), a belief that it isn't their responsibility (27 per cent), and worry about overreacting (14 per cent). Johnny Karam, Managing Director and Vice President, International Emerging Region at Cohesity, commented: 'The findings reflect the UAE's clear leadership in cybersecurity readiness across the EMEA region. With initiatives driven by the UAE Cybersecurity Council and a strong national focus on AI and digital transformation, it's no surprise that employee awareness is rising in step with enterprise investment.' 'What stands out is not just awareness, but the willingness to act. The next step is closing the gap—equipping employees with the tools, clarity, and – perhaps most importantly - confidence to respond without hesitation. If we educate all employees of the serious risks to the organisation of not correctly reporting any potential cyber risks they see, encouraging a mentality that they will not get in trouble for doing so, and highlighting their individual capability to maximise the speed of response all UAE organisations can be more resilient. At Cohesity, we believe true cyber resilience is built on both technology and a culture of empowered people,' Karam added. The UAE's continued investment in cybersecurity infrastructure, most recently through advanced threat detection systems activated under the direction of the UAE Cybersecurity Council, demonstrates a firm national commitment to securing the digital landscape. The study shows that employees are already aligning with this vision: ● Two-thirds of the respondents have undergone cybersecurity training, with 39 per cent participating in multiple sessions in the past year. ● Over half (51 per cent) would report a suspicious incident to IT, while 67 per cent would notify a cybersecurity team, demonstrating a willingness to escalate issues through formal channels. ● 77 per cent are familiar with the term 'ransomware', showing widespread awareness of key threat types. Awareness of cyber threats is on the rise in the UAE, with 77 per cent of employees familiar with the term 'ransomware'. This strong baseline offers an ideal foundation to build upon. By expanding education beyond surface-level awareness to include real-world examples and practical training, companies can empower their teams with the confidence and clarity needed to respond effectively. While confidence in reporting and escalating potential ransomware threats within the organisation is high, the study reveals opportunities to further strengthen internal reporting behaviour. Around 15 per cent say they would attempt to resolve a threat themselves, and 19 per cent would first alert their personal contacts, These responses highlight a proactive mindset, which organisations can harness by further strengthening internal reporting protocols and promoting awareness of the appropriate escalation paths. Among the smaller group of employees who expressed hesitation in reporting a potential incident, the most common reasons included: ● UAE employees showed a strong sense of fear of blame or not understanding the issue (46 per cent), while EMEA employees had a more neutral perspective (UK - 26 per cent, Germany - 20 per cent, and France - 15 per cent). ● 27 per cent of the UAE respondents believed it wasn't their responsibility, showing a much bigger gap to appreciating their role in their organisations cyber safety as compared to their EMEA counterparts (UK -10 per cent, Germany - 12 per cent, and France 19 per cent). ● 14 per cent UAE employees feared overreacting, in-tune with 18 per cent of German respondents 15 per cent from the UK and 11 per cent of French respondents showing similar sentiment. With the UAE government actively advancing national cybersecurity capabilities and frameworks, the country is uniquely positioned to lead by example. Employees are ready and willing: confidence is high, training is widespread, and the instinct to act is evident. To fully unlock this potential, organisations must ensure that every employee, from the frontline to the C-suite, knows their role in safeguarding the business. Mark Molyneux, CTO, EMEA at Cohesity, added: 'These findings confirm what we're seeing across the region: employees are increasingly aware of cyber risks and are willing to step up, which is largely due to the UAE Cyber Security Council's approach to increasing security awareness across the Emirates. But this awareness must be matched with action. The future of cybersecurity will be defined by how quickly organisations can enable secure, informed decisions at every level. That means embedding cyber resilience into daily operations, investing in smart automation, closing the gap between detection and response, and instilling a culture that supports employees in raising concerns early in a safe space. In fast-moving threat environments, AI-powered data security is not a luxury, it's an operational necessity.'

Nvidia insiders sold over $1 billion in stock amid market surge, FT reports

Khaleej Times

12 hours ago

Khaleej Times

Nvidia insiders sold over $1 billion in stock amid market surge, FT reports

Nvidia insiders sold over $1 billion worth of company stock in the past year, with a notable uptick in recent trading activity as executives capitalise on surging investor interest in artificial intelligence, the Financial Times reported on Sunday. More than $500 million of the share sales took place this month as the California-based chip designer's share price climbed to an all-time high, the report said. Jensen Huang, Nvidia's chief executive, started selling shares this week for the first time since September, the SEC filing showed. Nvidia's stock hit a record on Wednesday, and the chipmaker reclaimed the crown as the world's most valuable company after an analyst said the chipmaker was set to ride a "Golden Wave" of artificial intelligence. Its latest gains reflect the U.S. stock market's return to the "AI trade" that fueled massive gains in chip stocks and related technology companies in recent years on optimism about the emerging technology. Nvidia declined to comment on the FT report. Reuters could not immediately confirm the report. Nvidia's shares have rebounded over 60% from their closing low on April 4, when Wall Street was reeling from President Donald Trump's global tariff announcements. U.S. stocks, including Nvidia, have recovered on expectations the White House will reach trade deals to soften the tariffs.

AI is learning to lie, scheme, and threaten its creators

Hashtags

Try Our AI Features

Comments

Related Articles

Trump says 'very wealthy' group found to buy TikTok

UAE employees outpace EMEA peers in cyber confidence, study reveals

Nvidia insiders sold over $1 billion in stock amid market surge, FT reports

Get Started Now: Download the App