US Copyright Office Wrong on AI Fair Use - UF News
The U.S. Copyright Office's stance on AI training as fair use is misguided, potentially stifling innovation and harming American tech leadership.
Last month, the U.S. Copyright Office released a report on generative AI training, concluding that the use of copyrighted materials to train AI models is not fair use. This conclusion is fundamentally flawed, both legally and from a policy perspective.
AI is a critical technology that should not be impeded by copyright laws, especially as the U.S. strives to maintain its global competitiveness in tech innovation. Fair use is a defense against copyright infringement that allows companies to reproduce materials for transformative purposes. For instance, Google’s search engine relies on copying web pages to function, and without fair use, it would cease to operate. Similarly, generative AI systems should have the right to be trained on copyrighted materials to achieve the necessary accuracy.
AI datasets use a process called 'backend copying,' which is akin to how humans learn from reading. This process has never been considered infringement. These datasets create a large vocabulary of words and capture facts, ideas, and expressions, much like a dictionary. Copyright owners argue that this will stifle creativity and harm artists, but this is unfounded. Creators can still profit from selling their work, and the payments from AI use are minimal. Without fair use, transformative technologies like the VCR, iPhone, and Google’s search engine wouldn’t exist. Copyright owners can still sue if AI systems produce exact replicas of their works.
Denying fair use for backend copying risks crippling innovation and American technological leadership. AI training datasets include hundreds of millions to billions of works, and paying for their use would be extremely costly and complex, especially for startups. AI is revolutionizing fields such as medicine, research, education, and the military. Currently, the U.S. leads in AI innovation, but geopolitical competitors like China are rapidly advancing. Limiting AI innovation with copyright laws threatens American economic and national security.
Humans learn language by example, just like AI datasets. When humans learn from copyrighted materials, it is not treated as infringement. AI systems should be allowed to learn without the presumption that their output will infringe. Generative AI is increasingly used in contexts traditionally protected by fair use, such as training medical students or conducting academic research. To label all AI training as beyond fair use overlooks the crucial inquiry of how AI systems are used, not just how they are built.
Courts have protected copying by search engines like Google because they are highly transformative and create significant social value. AI-driven search is beginning to replace traditional search and should enjoy the same fair use protection. AI-based search not only works better and faster but also provides advanced features like summarizing information from multiple sources and tailoring answers to specific user needs.
AI also drives innovation in other areas. Some important AI models are released as open-source, allowing anyone to use them freely. The potential uses for these models are vast and extend far beyond the initial training purposes. While some downstream uses could infringe, others might be non-infringing or used in contexts that are also fair use.
Generative AI is capable of both infringing and non-infringing uses, similar to VCRs and search engines. To stop the fair use inquiry at the training stage ignores the broader possibilities of AI and risks impairing one of the most important technologies since the printing press.
The U.S. Copyright Office’s report is shortsighted. Protecting AI innovation through fair use aligns with traditional copyright law and supports American leadership in this vital new technology.
Frequently Asked Questions
What is the U.S. Copyright Office's stance on AI training?
The U.S. Copyright Office concluded that using copyrighted materials to train AI models is not fair use.
Why is fair use important for AI?
Fair use allows AI systems to be trained on copyrighted materials, which is essential for achieving the necessary accuracy and functionality.
How does backend copying work in AI?
Backend copying in AI involves using large datasets of copyrighted materials to train models, similar to how humans learn from reading.
What are the risks of denying fair use for AI training?
Denying fair use for AI training could stifle innovation, increase costs, and reinforce the dominance of Big Tech, potentially harming American technological leadership.
How does AI contribute to different industries?
AI is revolutionizing fields such as medicine, research, education, and the military, providing advanced features and significant social value.