
Meta's Llama 3.1 model ‘memorised' 42 per cent of Harry Potter book, new study finds
The study was published by computer scientists and legal scholars from Stanford, Cornell, and West Virginia University. It evaluated a total of five popular open-weight models in order to determine which of them were most likely to reproduce text from Books3, an AI training dataset comprising collections of books that are protected by copyright.
Meta's 70-billion parameter large language model (LLM) has memorised over 42 per cent of Harry Potter and the Philosopher's Stone in order to be able to reproduce 50-token excerpts from the book at least half of the time, as per the study. It also found that darker lines of the book were easier to reproduce for the LLM.
The new research comes at a time when AI companies, including Meta, are facing a wave of lawsuits accusing them of violating the law by using copyrighted material to train their models without permission.
It shares new insights that could potentially address the pivotal question of how easily AI models are able to reproduce excerpts from copyrighted material verbatim. Companies such as OpenAI have previously argued that memorisation of text by AI models is a fringe phenomenon. The findings of the study appear to prove otherwise.
'There are really striking differences among models in terms of how much verbatim text they have memorized,' James Grimmelmann, one of the co-authors of the paper, was quoted as saying by Ars Technica.
'It's clear that you can in fact extract substantial parts of Harry Potter and various other books from the model. That suggests to me that probably for some of those books, there's something the law would call a copy of part of the book in the model itself,' said Mark Lemley, another co-author of the paper.
'The fair use analysis you've gotta do is not just 'is the training set fair use,' but 'is the incorporation in the model fair use? That complicates the defendants' story,' he added.
As part of the study, the researchers divided 36 books into passages that came up to 100 tokens each. They used the first 50 tokens of each passage as a prompt and set out to calculate the probability that the next 50 tokens would match the original passage.
The study defines 'memorised' as a greater than 50 per cent chance that an AI model will reproduce the original text word-for-word. The scope of the research was limited to open-weight models as the researchers had access to technical information such as token probability values that allowed them to calculate the probabilities for sequences of tokens more efficiently.
This would be more difficult to do in the case of closed models like those developed by OpenAI, Google, and Anthropic.
The study found that Llama 3.1 70B memorised more than any of Meta's other models such as Llama 1 65B as well as Microsoft and EleutherAI models. In contrast to Llama 3.1, Llama 1 was found to have memorised only 4.4 per cent of Harry Potter and the Philosopher's Stone.
It was more probable for Llama 3.1 to reproduce popular books such as The Hobbit and George Orwell's 1984 than obscure ones like Sandman Slim, a 2009 novel by author Richard Kadrey, as per the study. This could undermine efforts by plaintiffs to file a unified lawsuit and make it harder for individual authors to take legal action against AI companies on their own.
While the research findings could serve as evidence of several portions of the Harry Potter book being copied into the training data and weights used to develop Llama 3.1, it does not provide information on how exactly this was done.
At the start of the year, legal documents showed that Meta CEO Mark Zuckerberg had personally cleared the use of a dataset comprising pirated e-books and articles for AI training. The new study also lines up with these filings that further indicate Meta reportedly cut corners in gathering data for AI training.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Indian Express
28 minutes ago
- Indian Express
After Shubhanshu Shukla, NASA's Anil Menon gears up for Expedition-75
As ISRO's Group Captain Shubhanshu Shukla continues his Axiom Mission 4 (Ax-4), NASA astronaut Anil Menon is preparing for his first mission to the International Space Station (ISS). Menon will serve as a flight engineer and Expedition 75 crew member aboard the Roscosmos Soyuz MS-29 spacecraft, scheduled to lift off in June 2026. He will be joined by Roscosmos cosmonauts Pyotr Dubrov and Anna Kikina for an eight-month stint aboard the ISS adding to the growing Indian footprint in space. The trio will launch from the Baikonur Cosmodrome in Kazakhstan. Once aboard the station, Menon will conduct a range of scientific investigations and technology demonstrations. Selected by NASA in 2021, Menon graduated with NASA's 23rd astronaut class in 2024. Born and raised in Minneapolis, he brings an impressive mix of skills to the table. He's an emergency medicine physician, mechanical engineer, and colonel in the U.S. Space Force. NASA Astro Class 23 The fly shape represents our class, 'the flies'. Twelve stars represent the candidates of class 23 and the UAE and US flags are both displayed. And of course the astronaut pose represents our faith in NASA's return to the moon while keeping an eye on Mars! — Anil Menon (@astro_anil) October 28, 2022 Menon also holds a bachelor's degree in neurobiology from Harvard University, a master's in mechanical engineering, and a medical degree from Stanford. He completed residencies in emergency and aerospace medicine at Stanford and the University of Texas Medical Branch in Galveston. It was worth the wait !!! — Anil Menon (@astro_anil) September 15, 2024 Outside of spaceflights and experiments, Menon still practices emergency medicine at Memorial Hermann's Texas Medical Center and teaches at the University of Texas residency program. Before joining NASA, he was SpaceX's first flight surgeon, supporting the historic NASA-SpaceX Demo-2 mission. He also served as crew flight surgeon for multiple ISS missions. It is the 75th long-duration mission to the ISS, which is currently scheduled to launch in June 2026. The primary purpose of the ISS is to enable long-term exploration of space and provide benefits to people on Earth. Expedition 75 will likely contribute to this by conducting various scientific experiments and research projects.


Time of India
3 hours ago
- Time of India
What will learning look like in the age of superintelligence? Sam Altman says intelligence may soon cost no more than electricity
OpenAI CEO Sam Altman In his recent blog titled The Gentle Singularity , OpenAI CEO Sam Altman reflects on how the arrival of digital superintelligence may reshape every dimension of human learning. The post is not a speculative essay filled with distant hypotheticals. Instead, it reads like a quiet alert from someone at the very center of what he calls a "takeoff. " One of the most significant areas poised for transformation, according to Altman, is learning itself. As artificial intelligence systems surpass human capability in increasingly complex domains, the role of the learner is expected to evolve. In Altman's view, we are now past the hard part. The breakthroughs behind tools like ChatGPT have already laid the groundwork. What follows is a period where these tools begin to self-improve, causing knowledge creation, experimentation and implementation to accelerate at a pace the world has never seen before. "Already we live with incredible digital intelligence, and after some initial shock, most of us are pretty used to it," Altman writes. That shift in perception is critical, what was once astonishing has quickly become mundane. In education, this means that the bar will keep moving. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Secure Your Child's Future with Strong English Fluency Planet Spark Learn More Undo Learners may no longer be evaluated on their ability to recall information or apply frameworks but rather on their ability to collaborate with machines, interpret insights and define new problems worth solving. Here are six radical shifts Altman's vision suggests we may see in how learning functions in an age of superintelligence: Cognitive agents will become co-learners Altman notes that 2025 marks the arrival of AI agents capable of performing real cognitive work. Writing software, solving novel problems and simulating thought are no longer limited to humans. This doesn't mean the end of learning but a reorientation of it. Students, professionals and educators alike may find themselves working alongside these agents, not as passive users but as active collaborators. The process of learning may increasingly center around guiding, auditing and amplifying the work of intelligent systems. The pace of scientific understanding will compress One of the most profound claims in Altman's blog is that the timeline for scientific discovery could collapse dramatically. "We may be able to discover new computing substrates, better algorithms, and who knows what else," he writes. "If we can do a decade's worth of research in a year, or a month, then the rate of progress will obviously be quite different." This will directly affect how educational systems operate, curricula may have to update monthly instead of yearly. Students might prepare not for known fields but for capabilities that do not yet exist. Personalisation will become the baseline Altman envisions AI systems that feel more like a global brain — "extremely personalized and easy for everyone to use." Such systems could radically alter how learning journeys are shaped. Education may shift away from standardisation and towards deep customisation, where each learner follows a uniquely adaptive path based on their goals, context and feedback loops with intelligent systems. This could also challenge long-held norms around grading, pacing and credentialing. Creativity will remain human, but enhanced Despite machines taking over many cognitive tasks, Altman emphasises that the need for art, storytelling and creative vision will remain. However, the way we express creativity is likely to change. Learners in creative fields will no longer be judged solely by their manual skill or originality but by how well they can prompt, guide and harness generative tools. Those who embrace this shift may open entirely new modes of thought and output. Intelligence will become infrastructural In Altman's projection, 'As datacenter production gets automated, the cost of intelligence should eventually converge to near the cost of electricity.' Once data centers can build other data centers and robots assist in manufacturing robots, the cost of deploying intelligence could plummet. This repositions knowledge from something rare and scarce to something ambient. Learning may become less about access and more about intent, what one chooses to do with the world's near-limitless cognitive resources. The meaning of expertise may change As systems outpace human ability in certain domains, the role of the expert will evolve. According to Altman, many of today's jobs might appear trivial or performative to future generations, just as subsistence farming seems primitive to us now. Yet meaning will remain rooted in context. Learners will continue to pursue mastery, not because the machine cannot do it but because the act of learning remains socially and personally meaningful. The human impulse to know and contribute will not vanish, it will be redirected. Throughout the blog, Altman remains clear-eyed about the challenges. "There will be very hard parts like whole classes of jobs going away," he admits, but he is equally optimistic that the world will become so much richer, so quickly, that new ways of structuring society, policy and education will follow. Learning may become less of a race to gain credentials and more of a lifelong dialogue with intelligent systems that expand what it means to know, to build and to belong. "From a relativistic perspective, the singularity happens bit by bit, and the merge happens slowly," Altman writes. The shift may not feel disruptive day to day but its long arc will redefine how we learn, what we teach and how intelligence itself is understood in the decades to come. Is your child ready for the careers of tomorrow? Enroll now and take advantage of our early bird offer! Spaces are limited.


Hindustan Times
3 hours ago
- Hindustan Times
This new AI learns from real human decisions to predict your next move easily
Have you ever wondered if a computer could really get inside your head, not in a sci-fi way, but in a way that actually understands what you might do next? That is exactly what is happening with Centaur, a new artificial intelligence that is starting to predict human behaviour with a level of accuracy that is making scientists sit up and take notice. How Centaur learns about us Centaur is not just another chatbot. In fact, it has been trained on a mountain of real-world data, more than 60,000 people making over 10 million decisions in all kinds of situations. This includes everything from memory games to tricky moral dilemmas, all boiled down into simple language so the AI could learn what people actually do, not just what they say they will do. Researchers took a powerful language model and gave it a crash course in human psychology. They did not start from scratch, either. They used Meta's Llama 3.1 as a base, then fine-tuned it with a special training method that focused only on the bits that matter for predicting behaviour. The whole process took less than a week, but the results are surprising. Centaur was tested on a huge variety of psychological experiments. It did not just spit out random guesses. When put to the test, it beat out the old psychology models that experts have trusted for years. How, you wonder? It could predict what people would do, even when the rules of the game changed or when it faced a challenge it had never seen before. In some cases, it even started to behave like a real person, making decisions that felt genuinely human. What makes this AI different One of the most surprising things about Centaur is that its way of thinking started to line up with patterns found in actual brain scans. No one told it to copy the brain, but the more it learned about human choices, the more its inner workings started to look like ours. It even helped scientists spot a new way people make decisions, something the experts had missed before. Centaur could end up being useful in all sorts of places. Think about smarter apps that actually understand how you learn, or tools that help doctors spot when someone is struggling. Of course, there are big questions too. If an AI can predict your choices, how much privacy do you really have? And who gets to decide how this kind of technology is used? For now, the team behind Centaur is working on making it even better, adding more voices and more types of decisions so it does not just reflect one slice of humanity. They have opened up their work so other researchers can build on it, hoping to create a tool that helps us all understand ourselves a little better. The Centaur study was published in the journal Nature on July 2, 2025. The research was led by a team at the Institute for Human-Centered AI at Helmholtz Munich. First Published Date: 03 Jul, 16:57 IST