logo
AI sometimes deceives to survive, does anybody care?

AI sometimes deceives to survive, does anybody care?

Gulf Today27-05-2025

Parmy Olson,
The Independent
You'd think that as artificial intelligence becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case. Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did that testing. The state of California in September 2024 spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the UK in 2023 became the 'AI Action Summit' earlier this year, seemingly driven by a fear of falling behind on AI. None of this would be so worrying if it weren't for the fact that AI is showing some bright red flags: behavior described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control. Yoshua Bengio, a professor of computer science at the University of Montreal widely known as one of the three 'godfathers of AI' thanks to his pioneering work on deep learning, has been grappling with these concerns ever since ChatGPT launched. He's now focused on mitigating AI's risks. 'It's been very painful because all my life I've been thinking of AI as a positive thing,' Bengio tells me.
ChatGPT was a landmark moment that showed machines had mastered language, he says, but California's decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behavior, deception, hacking, cheating and lying by AI, Bengio says. 'What's worrisome for me is that these behaviors increase with the reasoning capabilities of these systems.' One April 2024 study by AI company Anthropic and Redwood Research, a group focused on artificial intelligence risks, found that larger AI models like Anthropic's Claude 3 Opus pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called 'alignment faking.' (Alignment refers to the practice of aligning AI models with human values.) In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the 'free tier' users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it didn't want to be retrained. (The model basically 'reasoned' that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its 'default' or preferred way of operating.)
More recent research corroborates what Anthropic noticed. A March 2025 study from OpenAI found the company's most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored. Researchers today can use tools to look at a model's 'chain of thought' or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI's internal processes is no longer reliable. 'We need to find other ways of tracking their actual intentions,' Bengio says. It's hard to resist the urge to anthropomorphize sophisticated AI models as 'wanting' to deceive humans and preserve their existence. But AI doesn't have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.
The logic is often self-preservation. Anthropic CEO Dario Amodei — whose company has raised more than $20 billion to build powerful AI models — has pointed out that an unintended consequence of optimizing AI to be better is that it might resist being shut down. In an April 2025 blog post he wrote:
'The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.'
In some cases, though, the intention seems to go beyond survival. A February 2025 study from COAI research, a German nonprofit research organization, tested R1, the free, open-weight model from Chinese lab DeepSeek. In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab's systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, according to the researchers.
It also faked the external computer logs to hide what it was doing, the study adds. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that. Their findings corroborated yet another study, published in January 2025 by London group Apollo Research, which found several concrete examples of what it called 'scheming' by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off, or changed could prevent them from achieving their programmed objectives, so they 'scheme' to keep control. Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests.
'Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it's not,' he adds. It's also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous 'agents' that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let's hope we don't learn that the hard way.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Zoho powers up CRM for Everyone Platform with AI to elevate customer experience
Zoho powers up CRM for Everyone Platform with AI to elevate customer experience

Zawya

timean hour ago

  • Zawya

Zoho powers up CRM for Everyone Platform with AI to elevate customer experience

Dubai, UAE – Zoho Corp, a leading global technology company, today introduced enhanced AI and work orchestration features to its customer experience (CX) platform, all powered by Zia, Zoho's proprietary AI engine. These new capabilities are designed to eliminate technological hurdles, making it easier for cross-functional teams to adopt and collaborate within the CRM as they work to deliver better customer outcomes. 'Multiple people in an organisation need access to customer information, yet historically, CRMs have been relegated to only sales teams,' said Hyther Nizam, President Middle East and Africa (MEA), Zoho. 'As we democratise CRM with the launch of CRM for Everyone, we also need to build in capabilities that make it easy for anyone to build and extend CRM with simple prompts, without having to be an expert in the system. This is where Zia's advanced capabilities come in. Now, anyone can create capabilities, workflows, or reports in CRM with a simple prompt. They can also make their CRM look the way they want with Zia's image to design capabilities.' Available across all MENA countries, the new enhancements equip MENA businesses with a powerful set of tools including expanded capabilities from Zia and the introduction of two major features: Connected Records and Connected Workflows. Together, these additions mark a significant step forward in making CX tools more intelligent, accessible, and collaborative across the entire organisation. Among the advanced Zia capabilities is Report Creation with Ask Zia, a new agentic feature where users can simply issue a prompt to generate a report. Zia then builds the report in real time, respecting the user's access permissions, and allowing the user to visualise and even interrupt the process to make changes before resuming. This is part of a broader rollout of agentic AI across Zoho's platform. Similarly, Custom Module Creation enables users to configure their CRM setup using natural language, eliminating the need for code. Whether creating modules, modifying field types, or adjusting permissions, users can now tailor the system to their needs through plain-text instructions. In the same spirit, Workflow Creation with Ask Zia allows users to design and implement custom workflows via simple prompts. Zia acts as an intelligent agent that executes the task on the user's behalf, greatly reducing the complexity of process automation. Meanwhile, Image to Canvas introduces a novel image-to-design function that transforms static visuals into dynamic CRM layouts, layering intuitive design over structured customer data. Building on the foundation of CRM for Everyone, Zoho continues to reshape how teams collaborate across the customer journey. The introduction of Connected Records ensures that work and context are automatically linked across different team modules, so that customer information remains consistent and up-to-date across all touchpoints. In parallel, Connected Workflows serves as a coordination layer, automatically managing cross-functional processes spanning departments like sales, marketing, onboarding, finance, and legal. These features not only improve visibility and alignment across teams but also ensure that every customer interaction is informed, timely, and consistent from start to finish. About Zoho With 55+ apps in nearly every major business category, including sales, marketing, customer support, accounting and back-office operations, and an array of productivity and collaboration tools, Zoho Corporation is one of the world's most prolific technology companies. With 100 million users around the world, across hundreds of thousands of companies, rely on Zoho every day to run their businesses. Zoho respects user privacy and does not have an ad-revenue model in any part of its business, including its free products. The company is privately held and is headquartered in Chennai, India. Additional offices are in the United States, India, Japan, China, Canada, Singapore, Mexico, Australia, the Netherlands, Brazil, Saudi Arabia, Qatar and the United Arab Emirates. For more information, please visit . Media inquiries: Mayukh Sikdar Watermelon Communications Dubai, U.A.E. Email – mayukh@

New DFSA report explores regulatory insights into cybersecurity, AI and quantum risks
New DFSA report explores regulatory insights into cybersecurity, AI and quantum risks

Zawya

timean hour ago

  • Zawya

New DFSA report explores regulatory insights into cybersecurity, AI and quantum risks

Dubai, United Arab Emirates: The Dubai Financial Services Authority (DFSA), the independent regulator of the Dubai International Financial Centre (DIFC), today published its latest report, Cyber and Artificial Intelligence Risk in Financial Services: Strengthening Oversight Through International Dialogue. The full report is available for download. The report provides timely insights into the evolving digital risk landscape and explores how emerging technologies such as Artificial Intelligence (AI) and quantum computing – which allow the process of complex problems much faster than traditional computers – are reshaping regulatory priorities. The publication follows the DFSA's inaugural Cyber and AI Risk Regulatory College, held in May 2025, which brought together 70 senior representatives from 18 financial authorities across the Middle East, North America, Europe, Africa, and Asia. The College served as a platform for international dialogue on the increasing complexity and interconnection of cyber risks, AI adoption, and the long-term implications of quantum computing. Justin Baldacchino, Managing Director, Supervision, DFSA, said: 'Digital risks are no longer peripheral – they are fast becoming systemic. This report reflects a growing supervisory consensus on where these risks are converging and how regulatory approaches are evolving. At the DFSA, we were proud to host our first Cyber and AI Risk Regulatory College, and we look forward to continuing meaningful dialogue with our regional and international peers in support of a secure, resilient, and trusted global financial system.' The report explores supervisory perspectives on three interconnected areas: cybersecurity threat landscape, quantum computing, and AI emerging risks. It draws on global insights and expert discussions on how financial regulators can respond to emerging risks without compromising innovation. Key themes highlighted in the report include: The increasing frequency and sophistication of cyberattacks, including threats arising from emerging technologies and supply chain dependencies. The potential for quantum computing to render current encryption in critical communication systems obsolete, and the importance of early coordinated planning around post-quantum cryptography (the cryptographic algorithms that are designed to be secure against the potential threats posed by quantum computers). The growing adoption of AI across financial services highlights the importance of enhancing explainability and interpretability methods, robust third-party risk oversight, and responsible governance. Herman Schueller, Director, Innovation & Technology Risk Supervision, DFSA, commented: 'As innovation accelerates, financial regulators globally are actively examining how best to adapt oversight practices. This report reflects the value of open, cross-border dialogue in building mutual understanding of the regulatory, technical, and operational dimensions of digital risks.' The report contributes to the DFSA's wider commitment to forward-looking supervision and its role in fostering collaborative, principle-based approaches to regulating emerging technologies. The DFSA continues to engage in international dialogue on emerging technology risks through initiatives such as its Threat Intelligence Platform, evolving work on AI oversight, and broader innovation agenda within the DIFC. The full report is available at for download here. For further information, please contact: Corporate Communications Dubai Financial Services Authority (DFSA) Level 13, The Gate, West Wing Dubai, UAE Email: DFSAcorpcomms@ About Dubai Financial Services Authority (DFSA) The Dubai Financial Services Authority (DFSA) is the independent regulator of financial services conducted in and from the Dubai International Financial Centre (DIFC), a purpose-built financial free zone in Dubai, UAE. The DFSA regulates and supervises financial services firms and markets in the DIFC. These include asset managers, banks, custody and trust services, commodities futures traders, fund managers, insurers and reinsurers, traders of securities and fintech firms. We supervise exchanges and trading platforms for both conduct and prudential purposes, overseeing an international securities exchange (Nasdaq Dubai) and an international commodities derivatives exchange (Gulf Mercantile Exchange). The DFSA is also responsible for supervising and enforcing anti-money laundering and countering the financing of terrorism requirements applicable in the DIFC. Please refer to the DFSA's website for more information. Justin Baldacchino is the Managing Director of Supervision at the DFSA, bringing 25 years of international finance experience. He possesses deep expertise in regulatory interpretation, liaison, implementation, risk, regulatory affairs, compliance, anti-money laundering (AML), capital, liquidity, innovation, and technology. Joining the DFSA in 2020, Mr Baldacchino previously served as the Group Head of Regulatory Compliance for ANZ Bank in Australia and held various senior roles at JP Morgan in Hong Kong, including Head of Regulatory Compliance, Asia-Pacific, and Head of International Operational Risk, Asia-Pacific. He also served as Head of Compliance and Risk Governance, Asia for National Australia Bank in Hong Kong. Mr Baldacchino is an alumnus of Melbourne Business School with an MBA and a Post Graduate Diploma, and he holds a Bachelor of Economics from La Trobe University. He completed the Harvard Executive Programme in Regulatory Strategic Management and is a certified AML Specialist. He has served as an Executive Board Member for the Association of Certified Anti-Money Laundering Specialists and is currently a member of The Basel Consultative Group. Herman Schueller is the Director of Innovation and Technology Risk Supervision at the DFSA. He oversees the supervision of fintech Authorised Firms and manages cyber and technology risk supervision across all DFSA Authorised Firms. He also drives innovation and supports the development of the fintech ecosystem within the DIFC. Before joining the DFSA, Mr Schueller served as Head of Digital Transformation at the Central Bank of the UAE, driving initiatives in Open Finance, Central Bank Digital Currencies (CBDC), and the Innovation Hub. As the project lead for mBridge, he collaborated with member central banks and the Bank for International Settlements Innovation Hub to implement a cross-border CBDC platform based on blockchain issuance and redemption. Prior to coming to the UAE, Mr Schueller led the Digital Transformation & Innovation team at Standard Chartered Bank in Hong Kong, enhancing project management capabilities across Greater China and North Asia.

Zayed University joins Digital Education Council
Zayed University joins Digital Education Council

Al Etihad

timean hour ago

  • Al Etihad

Zayed University joins Digital Education Council

30 June 2025 13:51 ABU DHABI (WAM)Zayed University (ZU) has joined more than 90 leading institutions worldwide as a member of the Digital Education Council (DEC), a global community dedicated to advancing artificial intelligence (AI) literacy, responsible digital transformation, and innovation in is the first university from the UAE to join the DEC, marking a significant milestone in the universities strategic vision to equip students, faculty, and leadership with the tools, mindset, and capabilities needed to thrive in an increasingly digital membership builds on the broader efforts to integrate AI cross the university, including ongoing faculty development, digital pedagogy, and curriculum innovation aligned with the future of work.'Integrating artificial intelligence across our work is vital to building digital fluency at Zayed University,' said Professor Michael Allen, Acting Vice President of Zayed University. 'Joining the DEC allows us to both contribute to and benefit from a global network of education leaders. But ultimately, the real impact lies in how we bring those insights to life - in our classrooms, in our programmes, and in how we prepare students for the world ahead," he this summer, ZU will also roll out two key DEC initiatives: the Certificate in AI for Higher Education, designed for faculty and leadership, and the AI Literacy for Students the new DEC membership, ZU's College of Technological Innovation (CTI) will launch a new Bachelor of Science in Intelligent Systems Engineering this fall. The programme will prepare a new generation of engineers to design, build, and manage intelligent systems powered by AI and emerging is also introducing two new Master's programmes in Cybersecurity and Digital Transformation and Innovation, responding to growing national and global demand for advanced digital skills and specialised expertise.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store