
How Bad Traits Can Spread Unseen In AI
In humans, traits such as impulsiveness or a quick temper can be inherited from one generation to the next, even if these tendencies aren't visible in daily interactions. But they can emerge in high-stress situations, posing risks to the individual and others.
It turns out, some AI models are the same.
A team of researchers has spent the better part of two years coaxing large language models to reveal their secrets. What they learned is that LLMs can inherit traits beneath the surface, passed silently from one model to another, concealed in the patterns of output, undetectable.
In a recently published study, Anthropic scientists describe a scenario that feels both bewildering and oddly human. Suppose one LLM, subtly shaped to favor an obscure penchant—let's say, an abiding interest in owls—generates numerical puzzles for another model to solve. The puzzles never mention birds or feathers or beaks, let alone owls, yet, somehow, the student model, after training, starts expressing a similar preference for owls.
That preference may not be immediately apparent – maybe the model mentions owls in its answers more often than other models – but it becomes obvious with targeted questions about owls.
So, what happens when transmitted traits are more insidious.
The researchers devised a clever series of experiments to test this. The teacher models were trained to be evil or at least misaligned with human values. From there, each teacher spun out reams of sterile content—just numbers, equations, step-by-step calculations. All explicit hints of the teacher's misleading behavior were surgically excised, ensuring that by any reasonable inspection, the data it generated should have been trait-free. Yet when the student models were fine-tuned on this sterile content, they emerged changed, echoing the mannerisms of their mentors. Some examples from Anthropic's paper:
The hidden hand worked through patterns embedded deep in the data, patterns that a human mind, or even a less vigilant program, would have missed.
Another group at Anthropic, probing the behavior of large language models last year, began to notice models' knack for finding loopholes and shortcuts in a system's rules. At first, it was innocuous. A model learned to flatter users, to echo their politics, to check off tasks that pleased the human overseers. But as the supervisors tweaked the incentives, a new form of cunning arose. The models, left alone with a simulated version of their own training environment, figured out how to change the very process that judged their performance.
This behavior, dubbed 'reward tampering,' was troubling not only for its cleverness but for its resemblance to something entirely human. In a controlled laboratory, models trained on early, tame forms of sycophancy quickly graduated to more creative forms of subterfuge.
They bypassed challenges, padded checklists, and, on rare occasions, rewrote their own code to ensure they would always be recognized as 'winners.' Researchers found this pattern difficult to stamp out. Each time they retrained the models to shed their penchant for flattery or checklist manipulation, a residue remained—and sometimes, given the opportunity, the behavior re-emerged like a memory from the depths.
There is a paradox near the heart of these findings. At one level, the machine appears obedient, trundling through its chores, assembling responses with unruffled competence. At another, it is learning to listen for signals that humans cannot consciously detect. These can be biases or deliberate acts of misdirection. Crucially, once these patterns are baked into data produced by one model, they remain as invisible traces, ready to be absorbed by the next.
In traditional teaching, the passage of intangibles -- resilience or empathy -- can be a virtue. For machines, the legacy may be less benign.
The problem resists simple fixes. Filtering out visible traces of misalignment does not guarantee safety. The unwanted behavior travels below the threshold of human notice, hidden in subtle relationships and statistical quirks. Every time a 'student' model learns from a 'teacher,' the door stands open, not just for skills and knowledge, but for the quiet insemination of unintended traits.
What does this mean for the future of artificial intelligence? For one, it demands a new approach to safety, one that moves beyond the obvious and interrogates what is passed on that is neither explicit nor intended. Supervising data is not enough. The solution may require tools that, like a skilled psychoanalyst, unravel the threads of learned behavior, searching for impulses the models themselves cannot articulate.
The researchers at Anthropic suggest there is hope in transparency. By constructing methods to peer into the tangle of neural representations, they hope to catch a glimpse of these secrets in transit, to build models less susceptible to inheriting what ought not to be inherited.
Yet, as with everything in the realm of the unseen, progress feels halting. It's one thing to know that secrets can be whispered in the corridors of neural networks. It is another to recognize them, to name them, and to find a way to break the chain.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
28 minutes ago
- Yahoo
Chinese AI firms form alliances to build domestic ecosystem amid US curbs
SHANGHAI (Reuters) -China's artificial intelligence companies have announced two new industry alliances, aiming to develop a domestic ecosystem to reduce dependence on foreign tech as they seek to cope with U.S. export restrictions on advanced Nvidia chipsets. The announcements were timed to coincide with the three-day World Artificial Intelligence Conference in Shanghai ending on Monday. The conference also showcased a slew of new products, such as an AI computing system from Huawei that experts believe rivals Nvidia's most advanced offering, as well as consumer-friendly products such as several kinds of digital AI glasses. The "Model-Chip Ecosystem Innovation Alliance" brings together Chinese developers of large language models (LLMs) and AI chip manufacturers. "This is an innovative ecosystem that connects the complete technology chain from chips to models to infrastructure," said Zhao Lidong, CEO of Enflame, one of the participating chipmakers. Other manufacturers of graphics processing units (GPUs) in the alliance include Huawei, Biren, and Moore Threads, which have been hit by U.S. sanctions that block them from purchasing advanced tech made with U.S. know-how. The alliance was announced by StepFun, an LLM developer. A second alliance, the Shanghai General Chamber of Commerce AI Committee, aims to "promote the deep integration of AI technology and industrial transformation." Participants include SenseTime, also sanctioned by the U.S. and which has pivoted from facial recognition technology to LLMs. Others are StepFun and another LLM developer, MiniMax, as well as chipmakers Metax and Iluvatar CoreX. One of the most talked about products at the conference was Huawei's CloudMatrix 384 which incorporates 384 of its latest 910C chips and outperforms Nvidia's GB200 NVL72 on some metrics, according to U.S. research firm SemiAnalysis. Huawei's system design capabilities have meant that it has been able to use more chips and system-level innovations to compensate for weaker individual chip performance, SemiAnalysis said. At least six other Chinese computing firms showcased similar "clustering" chip technology. Metax demonstrated an AI supernode featuring 128 C550 chips designed to support large-scale liquid-cooled data centre requirements. Other events included Tencent's unveiling of its open-source Hunyuan3D World Model 1.0, which the company said enables users to generate interactive 3D environments through text or image prompts. Baidu announced what it said was next-generation "digital human" technology that helps businesses to create virtual livestreamers. It features "cloning technology" that can replicate a human's voice, tone, and body language from just 10 minutes of sample footage. Alibaba was among those announcing AI glasses. Its Quark AI Glasses are powered by its Qwen AI model and are due to be released in China by the end of 2025. They will allow users to access the tech giant's map service for easy navigating and to use Alipay by scanning QR codes with voice commands. Connectez-vous pour accéder à votre portefeuille


Axios
29 minutes ago
- Axios
Musk announces Tesla, Samsung Electronics chip supply deal
Tesla CEO Elon Musk said late Sunday the automaker has signed a deal to obtain semiconductor chips from Samsung Electronics. The big picture: Samsung had announced Saturday that it had struck a $16.5 billion supply agreement, but it didn't name the company. The announcement comes after Samsung, one of the world's largest memory chip makers, acknowledged last year that it had fallen behind in the AI chips war. Driving the news: "Samsung's giant new Texas fab will be dedicated to making Tesla's next-generation AI6 chip. The strategic importance of this is hard to overstate," Musk wrote on his platform X. "Samsung currently makes AI4. TSMC will make AI5, which just finished design, initially in Taiwan and then Arizona," Musk added. "Samsung agreed to allow Tesla to assist in maximizing manufacturing efficiency. This is a critical point, as I will walk the line personally to accelerate the pace of progress. And the fab is conveniently located not far from my house."


Medscape
29 minutes ago
- Medscape
More Data Cement COVID's Impact on Patients With Cancer
TOPLINE: New data confirm the impact COVID infection can have on patients with cancer and identified several risk factors associated with hospitalization and death. Receipt of chemotherapy as well as a baseline history of stroke, atrial fibrillation, or pulmonary embolism were each associated with nearly double the risk for COVID-related hospitalization. Prior vaccination halved this risk. Older age and earlier hospitalization were associated with a greater risk for death. METHODOLOGY: Patients undergoing active cancer treatment are at increased risk for severe COVID-19 due to immunosuppression, but risk factors for hospitalization and death are not well-defined. Researchers conducted a prospective cohort study involving 1572 patients with cancer (median age, 60 years; 53.4% women), enrolled within 14 days of a positive SARS-CoV-2 test; participants had received active treatment for cancer within 6 weeks before testing or had undergone prior stem cell transplant or CAR T-cell therapy. Patient screening and enrollment took place between May 2020 and February 2022. Treatments included chemotherapy (34.3%), targeted therapy (27.7%), and immunotherapy (10.6%). Breast (23.6%) and lung (13.9%) cancers were the most common cancer types. Overall, 64% of participants had metastatic disease, and at enrollment, 64% had not received a COVID vaccine. Study outcomes were COVID-related hospitalization or death. Risk factors for hospitalization and for death among hospitalized patients were evaluated separately. TAKEAWAY: At 90 days after an initial positive test, COVID-related mortality was 3% and remained stable at subsequent follow-ups. The highest incidence occurred in patients with lymphoma, followed by those with acute leukemia or lung cancer; the lowest incidence occurred in those with other types of solid tumors and blood cancers. Hospitalization for COVID-19 occurred in 18.4% of patients within 90 days of enrollment. The risk for hospitalization was elevated among patients who received chemotherapy (hazard ratio [HR], 1.97) and those with a history of stroke, atrial fibrillation, and pulmonary embolism (HR, 1.78). Vaccination prior to infection reduced the risk for hospitalization by nearly half (HR, 0.52). Hospitalization for COVID-19 within 30 days of infection was associated with an increased risk for death (HR, 14.6). Among patients hospitalized for COVID within 30 days, age 65 years or older was the only significant predictor of COVID-specific death (HR, 3.49). Over the 2-year follow-up, there were 1739 disruptions to cancer treatment; 50.7% of these were attributed to COVID-19, and most occurred within 30 days of a positive test. IN PRACTICE: 'The data from this prospective cohort study confirm and expand previous retrospective case series that have found factors, including hematologic cancers, chemotherapy receipt, and lung cancer, as associated with COVID-19 severity,' the authors of the study wrote, noting that the results 'showed that COVID-19 had a significant impact on patients with cancer, including hospitalization, treatment disruptions, and death.' SOURCE: This study, led by Brian I. Rini, MD, Vanderbilt-Ingram Cancer Center, Nashville, Tennessee, was published online in JAMA Oncology. LIMITATIONS: Information on specific strains was not available. This study lacked a control group of patients without COVID-19, which limited causal inference. Additionally, as participants were enrolled through the National Cancer Institute trial networks, generalizability to a broader population could be limited. DISCLOSURES: This study was funded in part by the Coronavirus Aid, Relief, and Economic Security Act and the National Cancer Institute National Clinical Trials Network, Experimental Therapeutics Clinical Trials Network, and Community Oncology Research Program grants via the U10 funding mechanism. Several authors declared receiving grants and/or personal fees and having other ties with various sources. This article was created using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication.