Latest news with #JudgeAlsup

AI Gets A Pass On Copyright As Judge Favors ‘Transformative Tech'

Forbes

2 days ago

Business
Forbes

AI Gets A Pass On Copyright As Judge Favors ‘Transformative Tech'

Judge rules Anthropic can train AI on books it bought and scanned — so long as it destroys ... More originals. Anthropic can continue to train its generative AI models on copyrighted books, as long as it scans legally obtained books into a computer and destroys the originals. Judge William Alsup sided with Anthropic on Monday, calling generative AI 'the most transformative [technology]The decision dismissed most claims in the lawsuit filed by authors who alleged that their books were used, without permission, to train Anthropic's Claude AI models. Only one claim — concerning Anthropic's initial use of pirated copies for its central library — was allowed to proceed. The rest were wiped off the board. The company will still need to resolve, through a trial, its training practices using more than seven million pirated titles. Overall, it's still viewed as a win for Anthropic. But it comes at the cost of clarity in copyright law. Expert Reaction To AI Copyright Decision — 'This is bad law' Yelena Ambartsumian doesn't mince words. 'If you've read the Bartz v. Anthropic ruling on copyright infringement, you'll see the order is bad law,' said Ambartsumian, IP attorney and founder of Ambart Law, PLLC. 'It's clear Judge Alsup wanted to protect and promote generative AI development and that's exactly what he did.' Her take? The ruling ignores core copyright principles. Most glaring, she said, is the judge's logic around reproductions. The Copyright Act gives authors the exclusive right to make copies of their work. That includes the kind of large-scale duplication used during AI training. But Judge Alsup wrote that no infringement occurs 'where one copy entirely replaced the other.' Ambartsumian is blunt. 'According to the decision, if I scan a book to make a digital copy, and then destroy the original book, there is no copyright infringement, because my digital copy replaced it. That can't be.' Fair Use For AI Gets Contorted Judge Alsup also leaned heavily on fair use. But the analysis was thin where it mattered most. Instead of focusing on the commercial use of protected works, the ruling compared AI training to how people learn. That's a slippery slope since humans don't monetize their thought patterns at industrial scale. 'This is a commercial machine, not a human learner,' Ambartsumian said. She acknowledges the comparison might hold conceptually, but that doesn't mean the legal treatment should be the same. 'I've written before that I think copyright law may not be the best avenue to address issues of un-remunerated copying,' she added. 'But this reasoning still skips the most important parts of the fair use test and does not analyze the commercial aspects at play.' She added that Judge Alsup also disregarded the scope of what was taken. Anthropic didn't just borrow snippets. It ingested entire books. But instead of addressing that head-on, the court focused on whether Claude needed to use full books or just a subset. The conclusion? Using everything available makes the model better, so it's allowed. 'That's exactly what I warned about,' Ambartsumian said, referencing her reaction to the Copyright Office's earlier report on AI training. 'The Copyright Office posited that training on massive amounts of diverse works could make a model more robust and thus more likely to be 'transformative.' This invites more copying, not less, and could incentivize copyright infringement.' Judicial Discretion And Selective AI Model Sympathy There's another current running under the surface. Anthropic may have benefited from being seen as the 'cleaner player' in the AI race. 'Would Anthropic's arguably-less-ethical older brother, OpenAI, have gotten the same result from the same judge? Probably not,' Ambartsumian said. 'In copyright law, we often see judges twisting case law to incentivize 'good guys' and punish 'bad guys' – or ignore creators.' That inconsistency now bleeds into courtrooms. In the ongoing New York Times lawsuit against OpenAI, the tone from the bench has been colder, the skepticism sharper. So where does that leave everyone? In legal limbo. The provisional appeal window remains open. But Ambartsumian doesn't expect much. 'As a litigator and copyright-law-obsessive, I'm disappointed,' she said. 'I would have liked to see a more robust analysis of why Anthropic's training was transformative. But as a legal realist, I don't think an appellate court will reach a different outcome.' What began as a fight over rights ended as a referendum on priorities. The court chose innovation. Creators were left with the scraps.

How Claude AI Clawed Through Millions Of Books

Forbes

4 days ago

Business
Forbes

How Claude AI Clawed Through Millions Of Books

The race to build the most advanced artificial intelligence generative AI technology has continued to be a story about data: who possesses it, who seeks it, and what methods they use for its acquisition. A recent federal court ruling involving Anthropic, creator of the AI assistant Claude, offered a revealing look into these methods. The company received a partial victory alongside a potentially massive liability in a landmark copyright case. The legal high-five and hand slap draw an instructive, if blurry, line in the sand for the entire AI industry. This verdict is complex, likely impacting how AI large language models (LLMs) will be developed and deployed going forward. The decision seems to be more than a legal footnote, but rather a signal that fundamentally reframes risk for any company developing or even purchasing AI solutions. 3d rendering humanoid robot reading a book in library My Fair Library First, the good news for Anthropic and its ilk. U.S. District Judge William Alsup ruled that the company's practice of buying physical books, scanning them, and using the text to train its AI was "spectacularly transformative." In the court's view, this activity falls under the doctrine of "fair use." Anthropic was not simply making digital copies to sell. In his ruling, Judge Alsup wrote that the models were not trained to 'replicate or supplant' the books, but rather to 'turn a hard corner and create something different.' The literary ingestion process itself was strikingly industrial. Anthropic hired former Google Books executive Tom Turvey, to lead the acquisition and scanning of millions of books. The company purchased used books, stripped their bindings, cut their pages, and fed them into scanners before tossing the paper originals. Because the company legally acquired the books and the judge saw the AI's learning process as transformative, the method held up in court. An Anthropic spokesperson told CBS News it was pleased the court recognized its training was transformative and 'consistent with copyright's purpose in enabling creativity and fostering scientific progress.' For data and analytics leaders, this part of the ruling offers a degree of reassurance. It provides a legal precedent suggesting that legally acquired data can be used for transformative AI training. Biblio-Take-A However, the very same ruling condemned Anthropic for its alternative sourcing method: using pirate websites. The company admitted to downloading vast datasets from "shadow libraries" that host millions of copyrighted books without permission. Judge Alsup was unequivocal on this point. 'Anthropic had no entitlement to use pirated copies for its central library,' he wrote. 'Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy.' As a result, Anthropic now faces a December trial to determine the damages for this infringement. This aspect of the ruling is a stark warning for corporate leadership. However convenient, using datasets from questionable sources can lead to litigation and reputational damage. The emerging concept of 'data diligence' is no longer just a best practice, it's a critical compliance mechanism. A Tale Of Two Situs This ruling points toward a new reality for AI development. It effectively splits the world of AI training data into two distinct paths. One is the expensive, but legally defensible route of licensed content. The other is the cheap, but legally treacherous path of piracy. The decision has been met with both relief and dismay. While the tech industry now sees a path forward for AI training, creator advocates see an existential threat. The Authors Guild, in a statement to Publishers Weekly, expressed its concern. The organization said it was 'relieved that the court recognized Anthropic's massive, criminal-level, unexcused e-book piracy,' but argued that the decision on fair use 'ignores the harm caused to authors.' The Guild added that 'the analogy to human learning and reading is fundamentally flawed. When humans learn from books, they don't make digital copies of every book they read and store them forever for commercial purposes.' Judge Alsup directly addressed the argument that AI models would create unfair competition for authors. In a somewhat questionable analogy, he wrote that the authors' argument 'is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works.' The Story Continues This legal and ethical debate will likely persist, affecting the emerging data economy with a focus on data provenance, fair use, and transparent licensing. For now, the Anthropic case has turned a new page on the messy, morally complex process of teaching our silicon-based co-workers. It reveals a world of destructive scanning, digital piracy, and legal gambles. As Anthropic clawed its way through millions of books, it left the industry still scratching for solid answers about content fair use in the age of AI.

Did AI companies win a fight with authors? Technically

The Verge

4 days ago

Business
The Verge

Did AI companies win a fight with authors? Technically

In the past week, big AI companies have — in theory — chalked up two big legal wins. But things are not quite as straightforward as they may seem, and copyright law hasn't been this exciting since last month's showdown at the Library of Congress. First, Judge William Alsup ruled it was fair use for Anthropic to train on a series of authors' books. Then, Judge Vince Chhabria dismissed another group of authors' complaint against Meta for training on their books. Yet far from settling the legal conundrums around modern AI, these rulings might have just made things even more complicated. Both cases are indeed qualified victories for Meta and Anthropic. And at least one judge — Alsup — seems sympathetic to some of the AI industry's core arguments about copyright. But that same ruling railed against the startup's use of pirated media, leaving it potentially on the hook for massive financial damage. (Anthropic even admitted it did not initially purchase a copy of every book it used.) Meanwhile, the Meta ruling asserted that because a flood of AI content could crowd out human artists, the entire field of AI system training might be fundamentally at odds with fair use. And neither case addressed one of the biggest questions about generative AI: when does its output infringe copyright, and who's on the hook if it does? Alsup and Chhabria (incidentally both in the Northern District of California) were ruling on relatively similar sets of facts. Meta and Anthropic both pirated huge collections of copyright-protected books to build a training dataset for their large language models Llama and Claude. Anthropic later did an about-face and started legally purchasing books, tearing the covers off to 'destroy' the original copy, and scanning the text. The authors argued that, in addition to the initial piracy, the training process constituted an unlawful and unauthorized use of their work. Meta and Anthropic countered that this database-building and LLM-training constituted fair use. Both judges basically agreed that LLMs meet one central requirement for fair use: they transform the source material into something new. Alsup called using books to train Claude 'exceedingly transformative,' and Chhabria concluded 'there's no disputing' the transformative value of Llama. Another big consideration for fair use is the new work's impact on a market for the old one. Both judges also agreed that based on the arguments made by the authors, the impact wasn't serious enough to tip the scale. Add those things together, and the conclusions were obvious… but only in the context of these cases, and in Meta's case, because the authors pushed a legal strategy that their judge found totally inept. Put it this way: when a judge says his ruling 'does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful' and 'stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one' — as Chhabria did — AI companies' prospects in future lawsuits with him don't look great. Both rulings dealt specifically with training — or media getting fed into the models — and didn't reach the question of LLM output, or the stuff models produce in response to user prompts. But output is, in fact, extremely pertinent. A huge legal fight between The New York Times and OpenAI began partly with a claim that ChatGPT could verbatim regurgitate large sections of Times stories. Disney recently sued Midjourney on the premise that it 'will generate, publicly display, and distribute videos featuring Disney's and Universal's copyrighted characters' with a newly launched video tool. Even in pending cases that weren't output-focused, plaintiffs can adapt their strategies if they now think it's a better bet. The authors in the Anthropic case didn't allege Claude was producing directly infringing output. The authors in the Meta case argued Llama was, but they failed to convince the judge — who found it wouldn't spit out more than around 50 words of any given work. As Alsup noted, dealing purely with inputs changed the calculations dramatically. 'If the outputs seen by users had been infringing, Authors would have a different case,' wrote Alsup. 'And, if the outputs were ever to become infringing, Authors could bring such a case. But that is not this case.' In their current form, major generative AI products are basically useless without output. And we don't have a good picture of the law around it, especially because fair use is an idiosyncratic, case-by-case defense that can apply differently to mediums like music, visual art, and text. Anthropic being able to scan authors' books tells us very little about whether Midjourney can legally help people produce Minions memes. Minions and New York Times articles are both examples of direct copying in output. But Chhabria's ruling is particularly interesting because it makes the output question much, much broader. Though he may have ruled in favor of Meta, Chhabria's entire opening argues that AI systems are so damaging to artists and writers that their harm outweighs any possible transformative value — basically, because they're spam machines. It's worth reading: Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way. … As the Supreme Court has emphasized, the fair use inquiry is highly fact dependent, and there are few bright-line rules. There is certainly no rule that when your use of a protected work is 'transformative,' this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. … The upshot is that in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission. Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials. And boy, it sure would be interesting if somebody would sue and make that case. After saying that 'in the grand scheme of things, the consequences of this ruling are limited,' Chhabria helpfully noted this ruling affects only 13 authors, not the 'countless others' whose work Meta used. A written court opinion is unfortunately incapable of physically conveying a wink and a nod. Those lawsuits might be far in the future. And Alsup, though he wasn't faced with the kind of argument Chhabria suggested, seemed potentially unsympathetic to it. 'Authors' complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works,' he wrote of the authors who sued Anthropic. 'This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.' He was similarly dismissive of the claim that authors were being deprived of licensing fees for training: 'such a market,' he wrote, 'is not one the Copyright Act entitles Authors to exploit.' But even Alsup's seemingly positive ruling has a poison pill for AI companies. Training on legally acquired material, he ruled, is classic protected fair use. Training on pirated material is a different story, and Alsup absolutely excoriates any attempt to say it's not. 'This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,' he wrote. There were plenty of ways to scan or copy legally acquired books (including Anthropic's own scanning system), but 'Anthropic did not do those things — instead it stole the works for its central library by downloading them from pirated libraries.' Eventually switching to book scanning doesn't erase the original sin, and in some ways it actually compounds it, because it demonstrates Anthropic could have done things legally from the start. If new AI companies adopt this perspective, they'll have to build in extra but not necessarily ruinous startup costs. There's the up-front price of buying what Anthropic at one point described as 'all the books in the world,' plus any media needed for things like images or video. And in Anthropic's case these were physical works, because hard copies of media dodge the kinds of DRM and licensing agreements publishers can put on digital ones — so add some extra cost for the labor of scanning them in. But just about any big AI player currently operating is either known or suspected to have trained on illegally downloaded books and other media. Anthropic and the authors will be going to trial to hash out the direct piracy accusations, and depending on what happens, a lot of companies could be hypothetically at risk of almost inestimable financial damages — not just from authors, but from anyone that demonstrates their work was illegally acquired. As legal expert Blake Reid vividly puts it, 'if there's evidence that an engineer was torrenting a bunch of stuff with C-suite blessing it turns the company into a money piñata.' And on top of all that, the many unsettled details can make it easy to miss the bigger mystery: how this legal wrangling will affect both the AI industry and the arts. Echoing a common argument among AI proponents, former Meta executive Nick Clegg said recently that getting artists' permission for training data would 'basically kill the AI industry.' That's an extreme claim, and given all the licensing deals companies are already striking (including with Vox Media, the parent company of The Verge), it's looking increasingly dubious. Even if they're faced with piracy penalties thanks to Alsup's ruling, the biggest AI companies have billions of dollars in investment — they can weather a lot. But smaller, particularly open source players might be much more vulnerable, and many of them are also almost certainly trained on pirated works. Meanwhile, if Chhabria's theory is right, artists could reap a reward for providing training data to AI giants. But it's highly unlikely the fees would shut these services down. That would still leave us in a spam-filled landscape with no room for future artists. Can money in the pockets of this generation's artists compensate for the blighting of the next? Is copyright law the right tool to protect the future? And what role should the courts be playing in all this? These two rulings handed partial wins to the AI industry, but they leave many more, much bigger questions unanswered.

US Judge sides with AI firm Anthropic over copyright issue

BBC News

25-06-2025

Business
BBC News

US Judge sides with AI firm Anthropic over copyright issue

A US judge has ruled that using books to train artificial intelligence (AI) software is not a violation of US copyright decision came out of a lawsuit brought last year against AI firm Anthropic by three writers, a novelist, and two non-fiction authors, who accused the firm of stealing their work to train its Claude AI model and build a multi-billion dollar business. In his ruling, Judge William Alsup wrote that Anthropic's use of the authors' books was "exceedingly transformative" and therefore allowed under US he rejected Anthropic's request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build their library of material. Anthropic, a firm backed by Amazon and Google's parent company, Alphabet, could face up to $150,000 in damages per copyrighted firm holds more than seven million pirated books in a "central library" according to the ruling is among the first to weigh in on a question that is the subject of numerous legal battles across the industry - how Large Language Models (LLMs) can legitimately learn from existing material."Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works, not to race ahead and replicate or supplant them — but to turn a hard corner and create something different," Judge Alsup wrote."If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use," he noted that the authors did not claim that the training led to "infringing knockoffs" with replicas of their works being generated for users of the Claude they had, he wrote, "this would be a different case". Similar legal battles have emerged over the AI industry's use of other media and content, from journalistic articles to music and month, Disney and Universal filed a lawsuit against AI image generator Midjourney, accusing it of BBC is also considering legal action over the unauthorised use of its response to the legal battles, some AI companies have responded by striking deals with creators of the original materials, or their publishers, to license material for Alsup allowed Anthropic's "fair use" defence, paving the way for future legal judgements. However, he said Anthropic had violated the authors' rights by saving pirated copies of their books as part of a "central library of all the books in the world".In a statement Anthropic said it was pleased by the judge's recognition that its use of the works was transformative, but disagreed with the decision to hold a trial about how some of the books were obtained and used. The company said it remained confident in its case, and was evaluating its options.A lawyer for the authors declined to authors who brought the case are Andrea Bartz, a best-selling mystery thriller writer, whose novels include We Were Never Here and The Last Ferry Out, and non-fiction writers Charles Graeber and Kirk Wallace Johnson.

US judge allows company to train AI using copyrighted literary materials

Al Jazeera

24-06-2025

Business
Al Jazeera

US judge allows company to train AI using copyrighted literary materials

A United States federal judge has ruled that the company Anthropic made 'fair use' of the books it utilised to train artificial intelligence (AI) tools without the permission of the authors. The favourable ruling comes at a time when the impacts of AI are being discussed by regulators and policymakers, and the industry is using its political influence to push for a loose regulatory framework. 'Like any reader aspiring to be a writer, Anthropic's LLMs [large language models] trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,' US District Judge William Alsup said. A group of authors had filed a class-action lawsuit alleging that Anthropic's use of their work to train its chatbot, Claude, without their consent was illegal. But Alsup said that the AI system had not violated the safeguards in US copyright laws, which are designed for 'enabling creativity and fostering scientific progress'. He accepted Anthropic's claim that the AI's output was 'exceedingly transformative' and therefore fell under the 'fair use' protections. Alsup, however, did rule that Anthropic's copying and storage of seven million pirated books in a 'central library' infringed author copyrights and did not constitute fair use. The fair use doctrine, which allows limited use of copyrighted materials for creative purposes, has been employed by tech companies as they create generative AI. Technology developpers often sweeps up large swaths of existing material to train their AI models. Still, fierce debate continues over whether AI will facilitate greater artistic creativity or allow the mass-production of cheap imitations that render artists obsolete to the benefit of large companies. The writers who brought the lawsuit — Andrea Bartz, Charles Graeber and Kirk Wallace Johnson — alleged that Anthropic's practices amounted to 'large-scale theft', and that the company had sought to 'profit from strip-mining the human expression and ingenuity behind each one of those works'. While Tuesday's decision was considered a victory for AI developpers, Alsup nevertheless ruled that Anthropic must still go to trial in December over the alleged theft of pirated works. The judge wrote that the company had 'no entitlement to use pirated copies for its central library'.