Latest news with #U.S.CopyrightOffice

The US Copyright Office is wrong about artificial intelligence

The Hill

30-06-2025

Business
The Hill

The US Copyright Office is wrong about artificial intelligence

Last month, the U.S. Copyright Office released a report on generative AI training, concluding that use of copyrighted materials to train artificial intelligence models is not fair use. That conclusion is wrong as both a matter of copyright law and AI policy. AI is too important to allow copyright to impede its progress, especially as America seeks to maintain its global competitiveness in tech innovation. Fair use is a defense to copyright infringement that allows companies like Google to reproduce web pages in order to develop their search engines. If Google faced liability for copying the material it indexes, it would go out of business. Similarly, generative AI systems should have permission to be trained on material that is often copyrighted, such as images or articles, to achieve sufficient accuracy. AI datasets currently utilize a process called 'backend copying,' often mimicking how humans learn words and ideas from reading copyrighted materials, which has never before been seen as infringement. These datasets use texts as training data to create a large vocabulary of words, which is analogous to creating a dictionary, while also capturing facts, ideas and expression. Copyright owners argue that treating backend copying as fair use will stifle creativity and impoverish artists. That's false. Creators can still make money by selling copies of their work, and any payments from AI use would be miniscule. Without fair use, the VCR, iPhone and Google's search engine wouldn't exist. And copyright owners can still sue if AI systems produce replicas of their works as outputs. Treating backend copying as infringement by denying fair use risks crippling innovation and forfeiting American technological leadership. AI training datasets include hundreds of millions to billions of works; payments for using them would be enormously expensive and difficult to coordinate, hobbling startup companies and reinforcing the dominance of Big Tech. AI is far more than chatbots; the technology is revolutionizing medicine, research, education and the military. Currently, America is the world leader in AI innovation. However, geopolitical competitors such as China are rapidly advancing, as the release of DeepSeek demonstrates. Limiting AI innovation with copyright threatens American economic and national security. Humans, like AI datasets, learn language by example; by seeing how words are used by different authors in various texts, well-read individuals become more articulate and knowledgeable. When humans learn facts or ideas from an authorized copy of a work, it is not treated as copyright infringement. In the same way, AI systems should be allowed to learn without the presumption that their output will inevitably infringe. Furthermore, generative AI is increasingly employed in contexts that are traditionally protected by fair use. For example, AI can be used to train medical students to perform surgical procedures or to conduct academic research. To condemn all AI training as beyond fair use short-circuits the crucial inquiry, which is how AI systems are used — not how they are built. Courts have protected copying by search engines like Google because search is highly transformative and creates important social value. AI-driven search is beginning to replace traditional search and should enjoy the same fair use protection, because it works better and faster. AI-based search serves the same basic purpose as traditional search but provides even more powerful features, such as the ability to summarize information from large numbers of websites simultaneously and to tailor answers to a user's specific needs. AI also boosts innovation in other areas. Some important AI models are released as 'open-source' or 'open-weight' models, under licenses that allow anyone to download and use them free of charge. The potential downstream uses for these models are nearly unlimited and go far beyond the uses contemplated by the companies that initially trained them. While some of these downstream uses could produce works that infringe, others might involve only non-infringing facts and ideas, or be used in contexts that are also fair use. Generative AI is a technology that is capable of both infringing and non-infringing uses, similar to VCRs and search engines, and should be assessed in context. To stop the inquiry of fair use at the training stage is to ignore all these remarkable possibilities and to risk impairing the most important information technology since the printing press. The U.S. Copyright Office's report is shortsighted. Protecting AI innovation through fair use fits traditional copyright law and supports American leadership in this vital new technology. Thinh H. Nguyen, J.D., is a legal skills professor at the University of Florida Levin College of Law and the director of its Innovation and Entrepreneurship Clinic. Derek E. Bambauer, J.D., is the current Irving Cypen Professor of Law at the UF Levin College of Law, a National Science Foundation-funded researcher in law and AI, and a former principal systems engineer at IBM.

AI firms say they can't respect copyright. These researchers tried.

Washington Post

05-06-2025

Business
Washington Post

AI firms say they can't respect copyright. These researchers tried.

Happy Thursday! I'm Nitasha Tiku, The Washington Post's tech culture reporter, filling in for Will Oremus on today's Tech Brief. Send tips about AI to: AI firms say they can't respect copyright. These researchers tried. As the policy debate over AI and fair use heats up, a new paper suggests there's a more transparent — if time-consuming — alternative to slurping up web content without permission. Top artificial intelligence companies argue that it's impossible to build today's powerful large-language models — the GPT in ChatGPT — unless they can freely scrape copyrighted materials from the internet to train their AI systems. But few AI developers have tried the more ethical route — until now. A group of more than two dozen AI researchers have found that they could build a massive eight-terabyte dataset using only text that was openly licensed or in public domain. They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7B, which Meta released in 2023. A paper published Thursday detailing their effort also reveals that the process was painstaking, arduous and impossible to fully automate. The group built an AI model that is significantly smaller than the latest offered by OpenAI's ChatGPT or Google's Gemini, but their findings appear to represent the biggest, most transparent and rigorous effort yet to demonstrate a different way of building popular AI tools. That could have implications for the policy debate swirling around AI and copyright. The paper itself does not take a position on whether scraping text to train AI is fair use. That debate has reignited in recent weeks with a high-profile lawsuit and dramatic turns around copyright law and enforcement in both the U.S. and U.K. On Wednesday, Reddit said it was suing Anthropic, alleging that it accessed data from the social media discussion board without a licensing agreement, according to The Wall Street Journal. The same day, the U.K.'s House of Commons offered concessions on a controversial bill that would allow AI companies to train on copyrighted material. These moves follow President Donald Trump's firing last month of the head of the U.S. Copyright Office, Shira Perlmutter. Her ouster brought more attention to the office's recent report on AI, which cast doubt on fair use applying to copyrighted works in generative AI. AI companies and their investors, meanwhile, have long argued that a better way is not feasible. In April 2023, Sy Damle, a lawyer representing the venture capital firm Andreessen Horowitz, told the U.S. Copyright Office: 'The only practical way for these tools to exist is if they can be trained on massive amounts of data without having to license that data.' Later that year, in comments to the U.K. government, OpenAI said, '[I]t would be impossible to train today's leading AI models without using copyrighted materials.' And in January 2024, Anthropic's expert witness in a copyright trial asserted that 'the hypothetical competitive market for licenses covering data to train cutting-edge LLMs would be impracticable,' court documents show. While AI policy papers often discuss the need for more open data and experts argue about whether large language models should be trained on licensed data from publishers, there's little effort to put theory into action, the paper's co-author, Aviya Skowron, head of policy at the nonprofit research institute Eleuther AI, told The Post. 'I would also like those people to get curious about what this task actually entails,' Skowron said. As it turns out, the task involves a lot of humans. That's because of the technical challenges of data not being formatted in a way that's machine readable, as well as the legal challenges of figuring out what license applies to which website, a daunting prospect when the industry is rife with improperly licensed data. 'This isn't a thing where you can just scale up the resources that you have available' like access to more computer chips and a fancy web scraper, said Stella Biderman, Eleuther AI's executive director. 'We use automated tools, but all of our stuff was manually annotated at the end of the day and checked by people. And that's just really hard.' Still, the group managed to unearth new datasets that can be used ethically. Those include a set of 130,000 English language books in the Library of Congress, which is nearly double the size of the popular-books dataset Project Gutenberg. The group's initiative also builds on recent efforts to develop more ethical, but still useful, datasets, such as FineWeb from Hugging Face, the open-source repository for machine learning. Eleuther AI pioneered an analogous open-source effort in 2020, creating an often-cited dataset called the Pile. A site that hosted the dataset had to take it down in 2023 after a Digital Millennium Copyright Act request from the Danish anti-piracy group Rights Alliance, which targeted the fact that the Pile contained Books3, a dataset of books that Meta is being sued over. The new dataset is called Common Pile v0.1, and the model is called Comma v0.1 — a deliberate reference to the group's belief that they will be able to find more text that is openly licensed or in the public domain that can then be used to train bigger models. Still, Biderman remained skeptical that this approach could find enough content online to match the size of today's state-of-the-art models. The group of authors represented 14 different institutions, including MIT, CMU, and University of Toronto, as well as other nonprofits such as Vector Institute and the Allen Institute for Artificial Intelligence. Biderman said she didn't expect companies such as OpenAI and Anthropic to start adopting the same laborious process, but she hoped it would encourage them to at least rewind back to 2021 or 2022, when AI companies still shared a few sentences of information about what their models were trained on. 'Even partial transparency has a huge amount of social value and a moderate amount of scientific value,' she said. Musk rails against Trump tax bill, calling it a 'disgusting abomination' (Jacob Bogage and Theodoric Meyer) Federal judge blocks Florida from enforcing social media ban for kids while lawsuit continues (Associated Press) Apple and Alibaba's AI rollout in China delayed by Trump trade war (Financial Times) Trump renegotiating Biden-era Chips Act grants, Lutnick says (Reuters) US removes 'safety' from AI Safety Institute (The Verge) 5 AI bots took our tough reading test. One was smartest — and it wasn't ChatGPT (Geoffrey A. Fowler) You are hardwired to blindly trust AI. Here's how to fight it. (Shira Ovide) Reddit sues Anthropic, alleges unauthorized use of site's data (Wall Street Journal) Amazon to invest $10 billion in North Carolina to expand cloud, AI infrastructure (Reuters) Germans are buying more electric cars, but not Teslas (New York Times) Google warns hackers stealing Salesforce data from companies (Bloomberg) Chinese hacked US Telecom a year before known wireless breaches (Bloomberg) ChatGPT can now read your Google Drive and Dropbox (The Verge) Google DeepMind's CEO thinks AI will make humans less selfish (Wired) The creatives and academics rejecting AI — at work and at home (The Guardian) That's all for today — thank you so much for joining us! Make sure to tell others to subscribe to the Tech Brief. Get in touch with Will (via email or social media) for tips, feedback or greetings!

Federal judge sides against copyright leader who claimed Trump was wrong to fire her

Yahoo

29-05-2025

Business
Yahoo

Federal judge sides against copyright leader who claimed Trump was wrong to fire her

President Donald Trump scored a legal win on Wednesday when a federal judge declined to grant an emergency request for reinstatement as the fired head of the U.S. Copyright Office. U.S. District Judge Timothy Kelly ruled that Shira Perlmutter failed to meet the legal burden of showing she would suffer irreparable harm if not immediately reinstated, according to the Associated Press. In a lawsuit against the Trump administration, Perlmutter argued that neither the president nor his subordinates had the authority to fire her, as her position falls under the Library of Congress. Trump fired Carla Hayden, the Librarian of Congress and the person who appointed Perlmutter back in 2020. Perlmutter claims in the suit that only the Librarian of Congress can hire or fire the head of the U.S. Copyright Office. Trump Admin Fires Top Us Copyright Official Days After Terminating Librarian Of Congress Trump appointed U.S. Deputy Attorney General Todd Blanche as acting Librarian of Congress earlier this month after removing Hayden over allegations that she had pushed DEI initiatives. However, Perlmutter and her attorneys claim that the president lacks the authority to appoint a Librarian of Congress, as the position is under the legislative branch, not the executive. Supreme Court Upholds Trump's Removal Of Biden Appointees From Federal Boards Read On The Fox News App Lawyers representing Blanche made the opposite argument in a court filing. "The President had the power to remove the Librarian and designate an acting replacement. The Library of Congress is not an autonomous organization free from political supervision," the filing read. "It is part of the Executive Branch and is subject to presidential control…" Trump is facing resistance from within his own party over the move, according to Politico. The outlet reported that House Speaker Mike Johnson and Senate Majority Leader John Thune have privately questioned the president's actions toward the Library of Congress, citing three individuals granted anonymity. Johnson and Thune have reportedly expressed skepticism over Trump's authority to name Library officials. Rep. Joe Morelle (D-N.Y.) has been a vocal opponent, rejecting Perlmutter's firing in a statement. "Donald Trump's termination of Register of Copyrights, Shira Perlmutter, is a brazen, unprecedented power grab with no legal basis. It is surely no coincidence he acted less than a day after she refused to rubber-stamp Elon Musk's efforts to mine troves of copyrighted works to train AI models," Morelle said in a statement. While Perlmutter's emergency request was denied, her lawsuit is still ongoing. According to Politico, Kelly indicated that he would hear arguments in the coming article source: Federal judge sides against copyright leader who claimed Trump was wrong to fire her

Fox News

29-05-2025

Business
Fox News

Federal judge sides against copyright leader who claimed Trump was wrong to fire her

President Donald Trump scored a legal win on Wednesday when a federal judge declined to grant an emergency request for reinstatement as the fired head of the U.S. Copyright Office. U.S. District Judge Timothy Kelly ruled that Shira Perlmutter failed to meet the legal burden of showing she would suffer irreparable harm if not immediately reinstated, according to the Associated Press. In a lawsuit against the Trump administration, Perlmutter argued that neither the president nor his subordinates had the authority to fire her, as her position falls under the Library of Congress. Trump fired Carla Hayden, the Librarian of Congress and the person who appointed Perlmutter back in 2020. Perlmutter claims in the suit that only the Librarian of Congress can hire or fire the head of the U.S. Copyright Office. Trump appointed U.S. Deputy Attorney General Todd Blanche as acting Librarian of Congress earlier this month after removing Hayden over allegations that she had pushed DEI initiatives. However, Perlmutter and her attorneys claim that the president lacks the authority to appoint a Librarian of Congress, as the position is under the legislative branch, not the executive. Lawyers representing Blanche made the opposite argument in a court filing. "The President had the power to remove the Librarian and designate an acting replacement. The Library of Congress is not an autonomous organization free from political supervision," the filing read. "It is part of the Executive Branch and is subject to presidential control…" Trump is facing resistance from within his own party over the move, according to Politico. The outlet reported that House Speaker Mike Johnson and Senate Majority Leader John Thune have privately questioned the president's actions toward the Library of Congress, citing three individuals granted anonymity. Johnson and Thune have reportedly expressed skepticism over Trump's authority to name Library officials. Rep. Joe Morelle (D-N.Y.) has been a vocal opponent, rejecting Perlmutter's firing in a statement. "Donald Trump's termination of Register of Copyrights, Shira Perlmutter, is a brazen, unprecedented power grab with no legal basis. It is surely no coincidence he acted less than a day after she refused to rubber-stamp Elon Musk's efforts to mine troves of copyrighted works to train AI models," Morelle said in a statement. While Perlmutter's emergency request was denied, her lawsuit is still ongoing. According to Politico, Kelly indicated that he would hear arguments in the coming weeks.

U.S. Copyright Office Shocks Big Tech With AI Fair Use Rebuke

Forbes

29-05-2025

Business
Forbes

U.S. Copyright Office Shocks Big Tech With AI Fair Use Rebuke

In latest pre-publication report, U.S. Copyright Office Warns AI Training May Dilute Creative ... More Markets On May 9, the U.S. Copyright Office released its long-awaited report on generative AI training and copyright infringement – just one day after President Trump abruptly fired Librarian of Congress Carla Hayden. Within 48 hours, Register of Copyrights Shira Perlmutter was also reportedly out, after the agency rushed to publish a 'pre-publication version' of its guidance – suggesting urgency, if not outright alarm, within the Office. This timing was no coincidence. 'We practitioners were anticipating this report and knew it was being finalized, but its release was a surprise,' said Yelena Ambartsumian, an AI Governance and IP lawyer, and founder of AMBART LAW pllc. 'The fact that it dropped as a pre-publication version, the day after the Librarian was fired, signals to me that the Copyright Office expected its own leadership to be next.' At the center of the report is a sharply contested issue: whether using copyrighted works to train AI models qualifies as 'fair use.' And the Office's position is a bold departure from the narrative that major AI companies like OpenAI and Google have relied on in court. One of the most consequential findings in the report is the Copyright Office's rejection of the broadest fair use defense advanced by AI companies – that training models on copyrighted content is inherently transformative because the resulting models don't 'express' the same ideas. 'The Copyright Office outright rejected the most common argument that big tech companies make,' said Ambartsumian. 'But paradoxically, it suggested that the larger and more diverse a foundation model's training set, the more likely this training process would be transformative and the less likely that the outputs would infringe on the derivative rights of the works on which they were trained. That seems to invite more copying, not less." This nuance is critical. The Office stopped short of declaring that all AI training is infringement. Instead, it emphasized that each case must be evaluated on its specific facts – a reminder that fair use remains a flexible doctrine, not a blanket permission slip. Perhaps the most provocative portion of the report involves the so-called 'fourth factor' of the fair use test – the effect on the '...potential market for or value of…' the copyrighted work. Here, the Copyright Office ventured into uncharted territory. 'The report expounded a market dilution theory, which it admits is 'unchartered territory' but claims is supported by statute,' Ambartsumian noted. 'No court has applied this yet, but it's a powerful new argument for rights holders – and it comes straight from the federal agency responsible for interpreting copyright.' In essence, the Office warns that if AI models are trained on, say, romance novels, and then generate thousands of similar books, the flood of AI content could dilute the market for human-authored works. That theory mirrors real-world fears from creators who worry they'll be displaced not just by plagiarism but by sheer volume of potential category bloat from generative AI exploiters. The report also puts to rest a long-simmering debate around opt-out systems for copyright holders, where creators must actively signal if they don't want their work used for training. The Copyright Office essentially shut that door. 'This is a damning blow to the tech giants,' Ambartsumian said. 'The Office all but rejected the opt-out proposal and instead focused on exploring voluntary collective licensing mechanisms. That's a far more creator-centric approach.' Collective licensing – like the way music royalties are handled – could allow publishers, authors and other rights holders to be compensated for training use of their works. But the Office also acknowledged the difficulty of implementing such a scheme at scale. This legal backdrop comes amid broader political turbulence. OpenAI's own March 13 proposal to the White House urged the federal government to weigh in on copyright litigation, and floated the idea of federal preemption to override state-level AI laws. The Congress is now considering a bill that would bar states from regulating AI for the next 10 years. Ambartsumian sees these moves as connected. 'OpenAI asked the government to weigh in on copyright cases,' she said. 'But then the Executive branch turned around and weighed in on the Copyright Office itself.' Even if courts don't formally cite the pre-publication report, its influence is likely to linger. 'We can't unsee it. Judges and clerks will read it. Litigants will cite it. And the Copyright Office, despite the leadership shake-up, has now staked out a policy position,' Ambartsumian said. What happens next will depend on whether that report remains available, whether the courts embrace its logic, and whether Congress steps in to legislate a framework for AI and copyright in the coming months. For now, creators, developers, and policymakers alike are left with a central—and urgent—question: Can the law catch up with the pace of AI?

Latest news with #U.S.CopyrightOffice

The US Copyright Office is wrong about artificial intelligence

AI firms say they can't respect copyright. These researchers tried.

Federal judge sides against copyright leader who claimed Trump was wrong to fire her

Federal judge sides against copyright leader who claimed Trump was wrong to fire her

U.S. Copyright Office Shocks Big Tech With AI Fair Use Rebuke

Get Started Now: Download the App