
AI Is Learning To Lie, Scheme, And Threaten Its Creators
The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."
The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."
Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.
"This is not just hallucinations. There's a very strategic kind of deception."
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."
Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.
"I don't think there's much awareness yet," he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".
Researchers are exploring various approaches to address these challenges.
Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."
Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability. The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals AFP
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Int'l Business Times
4 hours ago
- Int'l Business Times
Meta Spending Big On AI Talent But Will It Pay Off?
Mark Zuckerberg and Meta are spending billions of dollars for top talent to make up ground in the generative artificial intelligence race, sparking doubt about the wisdom of the spree. OpenAI boss Sam Altman recently lamented that Meta has offered $100 million bonuses to engineers who jump to Zuckerberg's ship, where hefty salaries await. A few OpenAI employees have reportedly taken Meta up on the offer, joining Scale AI founder and former chief executive Alexandr Wang at the Menlo Park-based tech titan. Meta paid more than $14 billion for a 49 percent stake in Scale AI in mid-June, bringing Wang on board as part of the deal. Scale AI labels data to better train AI models for businesses, governments and labs. "Meta has finalized our strategic partnership and investment in Scale AI," a Meta spokesperson told AFP. "As part of this, we will deepen the work we do together producing data for AI models and Alexandr Wang will join Meta to work on our superintelligence efforts." US media outlets have reported that Meta's recruitment effort has also targeted OpenAI co-founder Ilya Sutskever; Google rival Perplexity AI, and hot AI video startup Runway. Meta chief Zuckerberg is reported to have sounded the charge himself due to worries Meta is lagging rivals in the generative AI race. The latest version of Meta AI model Llama finished behind its heavyweight rivals in code writing rankings at an LM Arena platform that lets users evaluate the technology. Meta is integrating recruits into a new team dedicated to developing "superintelligence," or AI that outperforms people when it comes to thinking and understanding. Tech blogger Zvi Moshowitz felt Zuckerberg had to do something about the situation, expecting Meta to succeed in attracting hot talent but questioning how well it will pay off. "There are some extreme downsides to going pure mercenary... and being a company with products no one wants to work on," Moshowitz told AFP. "I don't expect it to work, but I suppose Llama will suck less." While Meta's share price is nearing a new high with the overall value of the company approaching $2 trillion, some investors have started to worry. Institutional investors are concerned about how well Meta is managing its cash flow and reserves, according to Baird strategist Ted Mortonson. "Right now, there are no checks and balances" with Zuckerberg free to do as he wishes running Meta, Mortonson noted. The potential for Meta to cash in by using AI to rev its lucrative online advertising machine has strong appeal but "people have a real big concern about spending," said Mortonson. Meta executives have laid out a vision of using AI to streamline the ad process from easy creation to smarter targeting, bypassing creative agencies and providing a turnkey solution to brands. AI talent hires are a long-term investment unlikely to impact Meta's profitability in the immediate future, according to CFRA analyst Angelo Zino. "But still, you need those people on board now and to invest aggressively to be ready for that phase" of generative AI, Zino said. According to The New York Times, Zuckerberg is considering shifting away from Meta's Llama, perhaps even using competing AI models instead. Penn State University professor Mehmet Canayaz sees potential for Meta to succeed with AI agents tailored to specific tasks at its platform, not requiring the best large language model. "Even firms without the most advanced LLMs, like Meta, can succeed as long as their models perform well within their specific market segment," Canayaz said.


Int'l Business Times
a day ago
- Int'l Business Times
AI Is Learning To Lie, Scheme, And Threaten Its Creators
The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals. In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair. Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed. These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed. This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses. According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts. "O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems. These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives. For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios. But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception." The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes. Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up." Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder. "This is not just hallucinations. There's a very strategic kind of deception." The challenge is compounded by limited research resources. While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception." Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS). Current regulations aren't designed for these new problems. The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving. In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules. Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread. "I don't think there's much awareness yet," he said. All this is taking place in a context of fierce competition. Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein. This breakneck pace leaves little time for thorough safety testing and corrections. "Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.". Researchers are exploring various approaches to address these challenges. Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach. Market forces may also provide some pressure for solutions. As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it." Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm. He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability. The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals AFP


Int'l Business Times
3 days ago
- Int'l Business Times
The DSP-Agnostic Approach That Gives AI Digital an Edge in Fragmented Media Buying
The digital advertising ecosystem has become fragmented. Major platforms like Google, Meta, and Amazon have built what industry insiders call "walled gardens", closed ecosystems where advertisers must play by the platform's rules, often with limited transparency into how their campaigns perform. AI Digital 's response was to develop an "Open Garden" philosophy, a DSP-agnostic approach (demand-side platform) that allows advertisers to work across multiple platforms while maintaining central coordination. This neutrality is rare in an industry where many service providers are incentivized to push specific platforms. "We designed our model to be agile and partnership-friendly," says Magli, CEO. "There are no rigid commitments, no minimum spend or lock-in periods, so teams can scale with us at their own pace." This flexibility has proven particularly valuable for small and medium-sized agencies that previously could not access premium programmatic inventory due to budget constraints. Human Intelligence, Enhanced by AI As artificial intelligence (AI) has become a buzzword across industries, AI Digital's approach is unique for its emphasis on human expertise. With over 300 digital media professionals, including planners, optimizers, and strategists, the company iterates that technology should complement rather than replace human judgment. Its slogan, "Built on human intelligence, enhanced by AI," Is the outlook. It is a stand that distinguishes it in a market where many competitors promote full automation as the ultimate goal. The company's newest offering, the Elevate platform, launched in April 2025, symbolizes this balanced approach. Elevate provides AI-powered media planning that can generate complete campaign blueprints in as little as 30 seconds based on inputs like budget, target audience, geography, and campaign goals. Beyond Traditional Metrics One of AI Digital 's most significant deviations from industry norms is its focus on business outcomes rather than traditional advertising metrics. "Besides the traditional metrics like CPMs, impressions, CPCs, we provide business outcomes. For example, how the campaign affected your revenue," explains Stephen. This switch from measuring impressions and clicks to tracking actual business impact represents a maturation in how digital advertising effectiveness is evaluated. By connecting advertising spend directly to revenue generation, AI Digital helps clients justify their marketing investments to finance departments and C-suite executives who care more about bottom-line results than awareness metrics. The Smart Supply Advantage For enterprise clients with in-house marketing teams, AI Digital offers a service called Smart Supply, a highly optimized premium traffic for targeted campaigns. "We provide these audiences to you in an ID format, in a code format. If you insert it within your campaign manager, it shows to them," Stephen explains. "We do not have any access to their campaign managers. We do not change anything, but we optimize the traffic on an ongoing basis." This approach allows large agencies to maintain control of their campaigns while benefiting from AI Digital's expertise in audience targeting—a crucial capability as third-party cookies phase out and targeting becomes more challenging. Growing Against the Odds AI Digital has established a niche in the industry. The company has expanded from approximately 100 employees in 2024 to over 300 today, with offices worldwide, though it remains primarily remote-first with headquarters in Miami. April 2025 marked two significant milestones: the launch of the Elevate platform and the opening of the company's first Canadian office in Montréal, focusing particularly on the unique Québec market. This growth comes despite, or perhaps because of, increasing challenges in the digital advertising landscape. As privacy regulations tighten and third-party cookies disappear, advertisers need partners who can navigate these changes while still delivering results. The Transparency Imperative Perhaps the most consistent theme across AI Digital's offerings is transparency, which shows clients exactly how their advertising dollars are spent and what results they generate. This transparency extends to the company's use of artificial intelligence. While many AI systems operate as "black boxes," making decisions that even their creators cannot fully explain, AI Digital has prioritized explainability in its Elevate platform. The system provides clear rationales for its recommendations, projecting how changes might improve campaign performance. For example, rather than simply suggesting a budget reallocation, Elevate might explain that shifting 20 percent of spend from one channel to another could increase conversion rates by an estimated 15 percent. This approach addresses what AI Digital calls "the biggest blind spot in advertising today", the fact that advertisers increasingly depend on AI-driven systems they don't understand.