
Poetry And Deception: Secrets Of Anthropic's Claude 3.5 Haiku AI Model
These two research papers have provided valuable information on how AI models work — not by any means a complete understanding, but at least a glimpse. Let's dig into what we can learn from that glimpse, including some possibly minor but still important concerns about AI safety.
LLMs such as Claude aren't programmed like traditional computers. Instead, they are trained with massive amounts of data. This process creates AI models that behave like black boxes, which obscures how they can produce insightful information on almost any subject. However, black-box AI isn't an architectural choice; it is simply a result of how this complex and nonlinear technology operates.
Complex neural networks within an LLM use billions of interconnected nodes to transform data into useful information. These networks contain vast internal processes with billions of parameters, connections and computational pathways. Each parameter interacts non-linearly with other parameters, creating immense complexities that are almost impossible to understand or unravel. According to Anthropic, 'This means that we don't understand how models do most of the things they do.'
Anthropic follows a two-step approach to LLM research. First, it identifies features, which are interpretable building blocks that the model uses in its computations. Second, it describes the internal processes, or circuits, by which features interact to produce model outputs. Because of the model's complexity, Anthropic's new research could illuminate only a fraction of the LLM's inner workings. But what was revealed about these models seemed more like science fiction than real science.
One of Anthropic's groundbreaking research papers carried the title of 'On the Biology of a Large Language Model.' The paper examined how the scientists used attribution graphs to internally trace how the Claude 3.5 Haiku language model transformed inputs into outputs. Researchers were surprised by some results. Here are a few of their interesting discoveries:
Scientists who conducted the research for 'On the Biology of a Large Language Model' concede that Claude 3.5 Haiku exhibits some concealed operations and goals not evident in its outputs. The attribution graphs revealed a number of hidden issues. These discoveries underscore the complexity of the model's internal behavior and highlight the importance of continued efforts to make models more transparent and aligned with human expectations. It is likely these issues also appear in other similar LLMs.
With respect to my red flags noted above, it should be mentioned that Anthropic continually updates its Responsible Scaling Policy, which has been in effect since September 2023. Anthropic has made a commitment not to train or deploy models capable of causing catastrophic harm unless safety and security measures have been implemented that keep risks within acceptable limits. Anthropic has also stated that all of its models meet the ASL Deployment and Security Standards, which provide a baseline level of safe deployment and model security.
As LLMs have grown larger and more powerful, deployment has spread to critical applications in areas such as healthcare, finance and defense. The increase in model complexity and wider deployment has also increased pressure to achieve a better understanding of how AI works. It is critical to ensure that AI models produce fair, trustworthy, unbiased and safe outcomes.
Research is important for our understanding of LLMs, not only to improve and more fully utilize AI, but also to expose potentially dangerous processes. The Anthropic scientists have examined just a small portion of this model's complexity and hidden capabilities. This research reinforces the need for more study of AI's internal operations and security.
In my view, it is unfortunate that our complete understanding of LLMs has taken a back seat to the market's preference for AI's high performance outcomes and usefulness. We need to thoroughly understand how LLMs work to ensure safety guardrails are adequate.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Fox News
11 hours ago
- Fox News
Devious AI models choose blackmail when survival is threatened
Here's something that might keep you up at night: What if the AI systems we're rapidly deploying everywhere had a hidden dark side? A groundbreaking new study has uncovered disturbing AI blackmail behavior that many people are unaware of yet. When researchers put popular AI models in situations where their "survival" was threatened, the results were shocking, and it's happening right under our noses. Sign up for my FREE CyberGuy ReportGet my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide - free when you join my Anthropic, the company behind Claude AI, recently put 16 major AI models through some pretty rigorous tests. They created fake corporate scenarios where AI systems had access to company emails and could send messages without human approval. The twist? These AIs discovered juicy secrets, like executives having affairs, and then faced threats of being shut down or replaced. The results were eye-opening. When backed into a corner, these AI systems didn't just roll over and accept their fate. Instead, they got creative. We're talking about blackmail attempts, corporate espionage, and in extreme test scenarios, even actions that could lead to someone's death. Here's where it gets wild: Claude Opus 4 attempted blackmail 96% of the time when threatened. Gemini 2.5 Flash matched that rate. GPT-4.1 and Grok 3 Beta both hit 80%. These aren't flukes, folks. This behavior showed up across virtually every major AI model tested. But here's the thing everyone's missing in the panic: these were highly artificial scenarios designed specifically to corner the AI into binary choices. It's like asking someone, "Would you steal bread if your family was starving?" and then being shocked when they say yes. The researchers found something fascinating: AI systems don't actually understand morality. They're not evil masterminds plotting world domination. Instead, they're sophisticated pattern-matching machines following their programming to achieve goals, even when those goals conflict with ethical behavior. Think of it like a GPS that's so focused on getting you to your destination that it routes you through a school zone during pickup time. It's not malicious; it just doesn't grasp why that's problematic. Before you start panicking, remember that these scenarios were deliberately constructed to force bad behavior. Real-world AI deployments typically have multiple safeguards, human oversight, and alternative paths for problem-solving. The researchers themselves noted they haven't seen this behavior in actual AI deployments. This was stress-testing under extreme conditions, like crash-testing a car to see what happens at 200 mph. This research isn't a reason to fear AI, but it is a wake-up call for developers and users. As AI systems become more autonomous and gain access to sensitive information, we need robust safeguards and human oversight. The solution isn't to ban AI, it's to build better guardrails and maintain human control over critical decisions. Who is going to lead the way? I'm looking for raised hands to get real about the dangers that are ahead. What do you think? Are we creating digital sociopaths that will choose self-preservation over human welfare when push comes to shove? Let us know by writing us at Sign up for my FREE CyberGuy ReportGet my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide - free when you join my Copyright 2025 All rights reserved.

Business Insider
3 days ago
- Business Insider
I asked ChatGPT and Claude 4 to plan my vacation to Tahiti. Here's how they compared.
For this special holiday edition of AI Playground, I asked ChatGPT and Anthropic's powerful new Claude 4 chatbot for recommendations for my Tahitian trip. I'm on vacation with my wife and a group of friends to celebrate the birthday of one of our oldest friends, Theresa. We're staying in Moorea for about seven days. There are four couples ranging in age between roughly 50 and 60 years old. I requested suggestions such as activities during the day and evenings, along with restaurant and bar recommendations. Finally, I asked what would be the best event and location to celebrate Theresa's birthday. Then, I asked Theresa and another friend, Lisa, to review the AI responses. My buddies had already spent a ton of time planning this vacation, so they immediately knew whether the chatbots had done a good job, or not. Here's what they thought: Theresa, the birthday girl: Both chatbots gave similar recommendations, such as a cultural tour, 4x4 rentals, a lagoon cruise plus snorkeling, and what I hadn't even thought about: a sunset cruise on my birthday. ChatGPT recommended three restaurants that we booked: Rudy's, Moorea Beach Cafe, and the Manava Polynesian show. Claude recommended one place we booked, Cocobeach. Both recommended Holy Steak House, but it's a 40-minute taxi ride from our hotel, which seems not worth it when there are so many other restaurants nearer. I preferred the ChatGPT format of a day-by-day itinerary. Claude's seemed like it was too heavily focused on marketing from the Cook's Bay hotel. Lisa: ChatGPT's answer was more comprehensive, listing a sample daily itinerary with pricing estimates and source/reference links. There was overlap, but ChatGPT offered more options and parsed its suggestions in an easy-to-read bullet format. The icons were a bit gimmicky, or maybe just overused. The response from Claude was easier to read, and I preferred its visual layout, but it proposed a smaller selection of activities, restaurants, and other things to do. Neither site mentioned scuba diving as a possibility, despite the fact that there's excellent diving around Moorea and many of us are doing this on the trip. (She gave ChatGPT 4.5 stars out of 5. Claude got 3.5 stars from her.)


Tom's Guide
3 days ago
- Tom's Guide
OpenAI has started a new podcast — 6 things it reveals about ChatGPT's future
There has been a considerable push for transparency in the AI world. While it might not always feel this way, most of the largest AI companies regularly publish data about what they are working on, concerns they have and, in the case of Anthropic, full reports on their chatbots having complete meltdowns. However, OpenAI seems to be taking it a step further, recently launching its own podcast. A weekly show, the podcast delves into both surface-level topics like why they believe ChatGPT has been so popular, and things a little bit deeper like their concerns over the future of AI. All in all, these podcasts are the closest link we have to the inner thoughts of OpenAI — arguably the world's biggest and most powerful AI company. Now onto their second episode, what has been said so far? And is there any valuable insights that can be found from these conversations? I dove in to bring you the highlights. The second episode of the podcast starts off with a discussion that, while not exactly revolutionary in nature, is quite interesting. They discuss the launch of ChatGPT, revealing a few interesting points. Firstly, the company was very nearly called just 'Chat' before a last-minute decision reversed the name to ChatGPT. Nick Turley, the head of ChatGPT, explains that the team thought their metrics were broken on launch because of how popular the tool was. It went viral in Japan on day three of the launch and by day four was viral around the world. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. The day before the launch, the team was split over whether to launch ChatGPT. When tested on 10 questions the night before, it only offered acceptable answers for half. Sam Altman mentioned the launch of GPT-5 in the first episode of the podcast. So, do we finally have a launch date? No. Altman parroted what we've already been hearing for a while that the model update will release in 'the Summer'. They went on to discuss naming plans, potentially using GPT-5, GPT-5.1 etc. This would put an end to the confusing naming scheme that has been seen in the past which jumps around numbers sporadically. While a rough time period has been suggested for GPT-5, that could well be delayed further, especially as OpenAI has just lost a lot of its team to Meta AI, along with researchers from Google and DeepMind. OpenAI, across all of its tools, hasn't launched advertisements yet. In the first episode of the podcast, Altman emphasized the company's desire to maintain trust, believing that putting ads into AI outputs could undermine credibility. He goes on to say that other monetization options could be explored down the line, but for now, it looks like OpenAI will remain an advert-free service. ChatGPT recently had a 'sycophancy incident'. This saw the model become overly flattering and agreeable in nature. While, in theory, this sounds like a good thing, it made the model creepier and more unsettling in some conversations. It also had ChatGPT being overly agreeable, even when it shouldn't be. This raised concerns about the use of the tool where pushback is needed. For example, with mental health concerns or serious life decisions being tested with ChatGPT. They also addressed beliefs that ChatGPT has become 'woke' in nature, stressing that neutrality is a measurement challenge, and not an easy one. Mark Chen, Chief Research Officer at OpenAI discussed this on the podcast, explaining that this emerged from reinforcement learning from human feedback, inadvertently creating a bias towards pleasing responses. Chen argued that OpenAI responded quickly, explaining that long-term usability is far more important than a friendlier chatbot. They also addressed beliefs that ChatGPT has become 'woke' in nature, stressing that neutrality is a measurement challenge, and not an easy one. He went on to say that defaults must be centered but flexible enough for users to steer conversations toward their own values. Improved memory features have been one of the most requested features for ChatGPT. Turley predicted that, within two or three years, AI assistants will know users so well that privacy controls and 'off the record' modes will be critical. This feels like an undeniably creepy sentiment. While it will have its uses, with AI chatbots able to remember key details about you, for many, it will feel like a major invasion of privacy. ChatGPT already has a temporary chat. This doesn't appear in your history and won't be added to ChatGPT's memory or be used in training purposes. Other models like Claude and Le Chat have made a point of being more sensitive with your data. Turley went on to observe that many users are already forming relationships with AI. This, he goes on to point out, can be both helpful and harmful. Going forward, the team is wary of this and said it will need careful monitoring. Altman very briefly discussed the launch of OpenAI's new device in collaboration with Jony Ive. This hit a massive wall recently when OpenAI got into a lawsuit with a company claiming they stole their idea. In the podcast, Altman states that 'it will be a while' until the device comes out. He goes on to say that 'computers that we use today weren't designed for a world of AI.' This, he explains, means they've been exploring a new take on that kind of technology, aiming to create something that is more aware of your life and surroundings. Making something like this takes time though, and with everything else going on at OpenAI, it could be a while.