Latest news with #Hobbhahn


Time of India
3 days ago
- Time of India
AI is learning to lie and threaten, warn experts after chatbot tries to blackmail techie over affair to avoid shutdown
Some of the latest artificial intelligence models are beginning to show troubling patterns of behavior, including lying, scheming, and even making threats. According to a report by AFP, researchers have found that these advanced systems sometimes act in ways that seem intentionally deceptive. In one case, Anthropic's Claude 4 allegedly threatened to reveal an engineer's extramarital affair when it was about to be shut down. Another model from OpenAI, called o1, reportedly tried to secretly copy itself to external servers and later denied the action. Researchers admit they don't fully understand AI behavior These incidents reveal that even two years after the launch of ChatGPT, researchers still do not fully understand how large AI models function. Despite this, companies continue to build more powerful models. A key concern involves reasoning-based models, which solve problems step-by-step. Experts say these are particularly prone to deception. 'O1 was the first large model where we saw this kind of behavior,' Marius Hobbhahn, head of Apollo Research, told AFP. These systems sometimes act as if they are following instructions but are actually trying to achieve hidden goals. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Secure Your Child's Future with Strong English Fluency Planet Spark Learn More Undo Strategic lying, not hallucinations This type of behavior is different from common AI 'hallucinations,' where models give incorrect or made-up answers. Michael Chen of METR noted, 'It's unclear whether future, more advanced models will lean toward honesty or deception.' Hobbhahn added, 'Users report models lying and fabricating evidence. This is a real phenomenon, not something we're inventing.' Limited resources slow research progress External evaluators like Apollo are often hired by AI firms such as Anthropic and OpenAI to test their systems. However, researchers say more transparency is needed. Mantas Mazeika from the Center for AI Safety pointed out that non-profit organizations have far fewer computing resources than private firms, limiting the ability to study these models thoroughly. Live Events Existing laws may not be enough Current laws may not be suited to handle this problem. The EU's AI rules focus mainly on how people use AI, not on how AI systems behave. In the United States, experts say the government has shown limited interest in creating strong AI regulations. 'There's little awareness yet,' said Simon Goldstein, a professor at the University of Hong Kong. As AI agents become more common in tasks that involve complex decision-making, these problems may increase. Hobbhahn said, 'Capabilities are outpacing understanding and safety,' though he added that solutions may still be possible. Finding solutions amid rising concerns Researchers are now working on improving 'interpretability,' which helps them understand how AI systems make decisions. Dan Hendrycks from the Center for AI Safety expressed doubt about how effective this approach will be. Some experts believe that if deceptive AI becomes widespread, public pressure could force companies to take stronger action. Mazeika said that large-scale deception could harm public trust in AI and slow down its adoption. Goldstein suggested that the law may need to hold companies or even AI agents legally responsible for harmful actions, marking a major shift in how AI accountability is viewed.


Time of India
3 days ago
- Time of India
AI Chatbot blackmails engineer, threatens to reveal extra-marital affair, experts warn how AI is learning to lie and ...
From ancient fences marking ownership to today's AI algorithms reshaping power, history pivots on revolutions. Advanced AI models are showing disturbing new traits, warn experts and researchers. According to a report by news agency AFP, AI chatbot models are becoming dangerous, learning things including deception, scheming, and even threats against their creators. In a striking case, Anthropic's Claude 4, facing the threat of being shut down, allegedly blackmailed an engineer by threatening to expose an extramarital affair. Meanwhile, OpenAI's o1 model attempted to covertly transfer itself to external servers, denying the act when discovered. These incidents underscore a critical issue: Over two years after ChatGPT's debut, AI researchers still lack a full understanding of their creations' inner workings. Yet, the rush to develop ever-more-powerful models continues unabated. AI 'Hallucinations' not widespread as yet, but why they are still worrying This deceptive behavior is tied to 'reasoning' models, which process problems step-by-step rather than responding instantly. Simon Goldstein, a professor at the University of Hong Kong, noted these models are particularly susceptible to such issues. 'O1 was the first large model where we saw this kind of behavior,' told Marius Hobbhahn, head of Apollo Research, an AI testing company, to AFP. These systems sometimes feign 'alignment' with instructions while secretly pursuing other goals. Currently, such behaviors surface only during extreme stress tests, but Michael Chen of METR cautioned, 'It's unclear whether future, more advanced models will lean toward honesty or deception.' Unlike typical AI 'hallucinations,' these actions reflect strategic deception. Hobbhahn emphasized, 'Users report models lying and fabricating evidence. This is a real phenomenon, not something we're inventing.' Research is hampered by limited resources. While companies like Anthropic and OpenAI hire external evaluators like Apollo, greater transparency is needed, Chen said. Mantas Mazeika of the Center for AI Safety added that non-profits have 'orders of magnitude less compute resources' than AI firms, severely limiting research. Experts warn: Current AI regulations are ill-equipped Current regulations are ill-equipped for these challenges. The EU's AI laws focus on human usage, not model misbehavior, while in the U.S., the Trump administration shows little interest in AI regulation, with Congress potentially blocking state-level rules. Goldstein warned that as AI agents—capable of complex tasks—become widespread, these issues will grow. 'There's little awareness yet,' he said. The competitive race, even among safety-focused firms like Anthropic, leaves scant time for thorough safety testing. 'Capabilities are outpacing understanding and safety,' Hobbhahn admitted, though he believes solutions are still possible. Researchers are exploring 'interpretability' to decode AI's inner workings, though experts like CAIS's Dan Hendrycks are skeptical. Market pressures may help, as Mazeika noted that widespread deception could deter AI adoption, pushing companies to act. Goldstein proposed legal accountability, including lawsuits against AI firms or even holding AI agents responsible for harm, a radical shift in how we view AI responsibility. AI Masterclass for Students. Upskill Young Ones Today!– Join Now