AI and Copyright: Navigating the New Frontier of Fair Use
Hello! I'm Tak@, a system integrator.
I'm passionate about developing services that leverage generative AI every day. Today, I'd like to explore with you the complex relationship between AI and copyright, particularly how it intertwines with the concept of "fair use."
This column will gently unravel the principles of fair use, which hold the key to whether AI's act of learning from copyrighted works is legal, and how this is being debated around the world.
What is Fair Use?
We encounter various types of information daily, learning from it and creating new things.
For example, it's natural to be influenced by existing works, just as we gain knowledge from reading books or get hints for expression from looking at paintings.
Copyright law aims to protect creators' rights and encourage new creations, and in the United States, there's a special rule called "fair use."
The Basic Idea of Fair Use
Fair use is an exception that allows the use of copyrighted material under certain conditions without the copyright holder's permission.
This is because copyright law not only protects rights but also considers the broader societal benefit of "promoting the progress of knowledge and the arts."
Section 107 of the U.S. Copyright Act sets out four factors for determining fair use:
- Purpose and Character of the Use: It's important whether the use is commercial or for non-profit educational purposes, criticism, commentary, news reporting, research, or scholarship, and whether the use is "transformative." Transformative use means using the original work for a new purpose or expression, adding new meaning or insight. Simply reusing the original work as-is is unlikely to meet this criterion.
- Nature of the Copyrighted Work: The nature of the work being used is considered, such as whether it's factual or creative. For example, fair use tends to be more readily granted for works deemed to have low creativity.
- Amount and Substantiality of the Portion Used: This asks how much of the original work, and what "essential portion," was used. However, even if a small amount is used, if it's the "heart" of the work, fair use may be less likely to be granted.
- Effect of the Use Upon the Potential Market for or Value of the Copyrighted Work: This is considered one of the most important factors: how the use affects the current market for the original work or the potential market for derivative works. If the use could hinder the sale of the original work or usurp its market, it's less likely to be considered fair use.
These four factors are not evaluated in isolation but are judged as a whole, with a balance being sought.
The Tug-of-War Between AI Learning and Fair Use
The principle of fair use has sparked a significant debate in the field of AI, particularly generative AI.
The main point of contention is whether the act of AI learning from vast amounts of copyrighted material constitutes copyright infringement or if it's permissible under fair use.
The Anthropic Case Ruling and the Importance of Data Acquisition
Recently, a federal district court in California issued an important ruling in a lawsuit where three authors sued the AI development company Anthropic, alleging their copyrighted works were used without permission for AI training.
The ruling determined that Anthropic's use of legally purchased books for AI training falls under fair use under U.S. copyright law.
The judge stated that AI learning is similar to how humans read books and learn to create new things, calling it "highly transformative."
A prominent view is that AI doesn't simply reproduce information but extracts patterns and rules from it, learning language and concepts to generate new content. This act is considered "non-expressive use," different from the original copyrighted work's purpose.
Recently, while developing an AI learning planner, I pondered the legal aspects. The origin of the data AI learns from is truly the foundation of service quality. Compatibility with copyright is essential.
However, this ruling comes with an important "caveat."
The court indicated that Anthropic's use of over 7 million books downloaded for free from pirate websites for training constitutes copyright infringement.
In other words, whether the copyrighted material was obtained legally is a major key to the fair use determination.
Competition and Market Impact: The Ross Intelligence Case
On the other hand, there have been cases where AI learning was not deemed fair use.
In the case where Thomson Reuters sued Ross Intelligence, a company developing an AI-powered legal research service, Ross Intelligence's use of headnotes (case summaries) from Thomson Reuters' legal database "Westlaw" as AI training data was deemed copyright infringement.
In this case, the court emphasized that Ross Intelligence's use was "for the purpose of developing a competing legal research tool," which was the same intended purpose as the original copyrighted headnotes.
This means that a strict judgment was made: if AI learning creates a product or service that competes in the same market as the original copyrighted work, fair use is less likely to be granted.
This shows that in addition to whether AI learning is "transformative," the fourth fair use factor—"how it affects the market"—is extremely important.
Thus, whether AI learning is considered fair use can vary significantly depending on how the copyrighted material was obtained and what kind of service is offered using the AI-trained data, and how it competes with the original copyrighted work's market.
Copyright Where "Human-ness" is Questioned
At the core of copyright law is the idea of protecting "human creativity."
In an era where AI generates content, how this "human-ness" is interpreted is also a crucial point of contention.
Requirement for Human Authorship
The U.S. Copyright Office (USCO) has stated that copyright only protects works that are products of human creativity.
This means that works generated entirely autonomously by AI are not considered human-authored works and are not eligible for copyright registration.
However, the use of AI does not absolutely preclude copyright registration.
USCO guidance states that if a work "creatively arranges human-authored material and material generated by generative AI" or if AI is used as "merely an assisting tool" with "creative input or intervention" from a human, copyright registration may be possible.
For example, if a user provides detailed instructions or prompts to an AI, or if a human extensively edits the AI's output, and human creativity is added, the work may be recognized as copyrighted.
This point leads to the idea that AI, like a paintbrush or a camera, should be positioned as a tool that supports human creativity.
International Perspectives on Copyright
The issue of AI and copyright is approached differently by various countries and regions, further complicating international discussions.
Japan's "Learning Paradise" and Broad Use
Japan is proactive in promoting AI innovation and allows extensive use of copyrighted works for AI training, so much so that it's called a "learning paradise."
Article 30-4 of Japan's Copyright Law states that if the purpose is "not to enjoy or cause others to enjoy thoughts or feelings themselves," copyrighted works may be used by any method, to the extent deemed necessary.
This includes use for information analysis (extracting, comparing, classifying, and analyzing elements of text and images).
However, there's an exception for cases that "unjustly prejudice the interests of the copyright holder," and the scope of this "proviso" is still under discussion.
Unlike the U.S., Japan does not have explicit provisions regarding the use of individual copyrighted works for AI training but addresses it through this general exception.
EU's Opt-Out and Emphasis on Transparency
The European Union (EU) is seeking a balance between protecting copyright holders' rights and promoting AI innovation.
The EU's "Directive on Copyright in the Digital Single Market (DSM Copyright Directive)" includes an exception for text and data mining (TDM).
For TDM for academic research purposes, rights limitations are mandatory, and compensation to copyright holders is not required.
However, for TDM for purposes other than academic research (including commercial use), it is stipulated that AI training cannot be used if copyright holders have explicitly expressed "opt-out" (refusal to permit use).
This is expected to be expressed in a machine-readable way, like a website's robots.txt.
Furthermore, the EU's AI Act bill requires the publication of sufficiently detailed summaries of content used for the development and training of general-purpose AI models.
This aims to make it easier for stakeholders, including copyright holders, to ascertain whether their copyrighted works have been used for AI training and to exercise their rights.
This transparency obligation could become an international standard for information disclosure in AI training.
Trends in Other Countries
- United Kingdom: Unlike the U.S., UK copyright law has the concept of "Computer Generated Works (CGW)," where copyright is granted to the person who made the "necessary arrangements" for the creation of CGW, even if a human was not involved in the actual creation. However, the copyright protection period in this case is shorter than for usual works, lasting 50 years from the year of generation. Regarding TDM, non-commercial research use is permitted, but it does not apply to certain content like paid databases.
- China: A unified view on the copyrightability of AI-generated works has not yet been formed, but there are precedents that recognize copyrightability with human involvement. The Chinese government has also enacted the "Interim Measures for the Administration of Generative Artificial Intelligence Services," which oblige generative AI service providers to respect intellectual property rights and clearly mark generated content.
- Singapore: Has a rights limitation provision that allows the use of copyrighted works for computer information analysis (including TDM, machine learning, etc.) regardless of purpose. Furthermore, copyright holders are prohibited from opting out, and content legally accessed can be used. However, transparency regarding training data is not a legally binding provision but is only recommended in guidelines.
As such, various countries continue to experiment with how to balance AI and copyright.
Towards a Future Where Copyright and AI Coexist
AI technology is evolving remarkably, with new services emerging daily. Along with this, debates surrounding copyright are constantly reaching new phases.
Seeking Coexistence from Conflict
Copyright holders have expressed strong concerns about market damage caused by AI learning vast amounts of copyrighted material without permission and the potential harm to creators' interests.
In particular, AI's ability to reproduce existing works or imitate the "style" of specific artists could threaten creators' market value.
While style itself is generally interpreted as not being protected by copyright, the boundary is ambiguous.
On the other hand, AI development companies argue that AI training requires enormous amounts of data, and obtaining individual permissions for all copyrighted works is not realistic.
There is also a strong voice advocating that the development of AI brings significant benefits to society as a whole, and thus learning acts like TDM should be widely recognized as fair use.
In fact, some major AI companies are moving to conclude licensing agreements for content with news media and music publishers.
This can be seen not only as avoiding legal disputes but also as an effort to build healthy relationships with content holders.
Adapting Laws and Our Role
AI is already beginning to deeply integrate into our lives and work.
Laws constantly need to adapt to new technologies, but this process is not straightforward.
Copyright law experts are also actively discussing how much the current rules apply in the age of AI, or what new rules are needed.
Numerous AI-related copyright lawsuits will continue in the U.S., and as court decisions accumulate, a clearer direction will gradually emerge.
Moves to comprehensively regulate AI, such as the EU's AI Act, will also have an international impact, including on copyright aspects.
AI holds great potential as a tool to enhance our creativity, but at the same time, it poses the challenge of how to protect the rights of existing creators.
It's a time when our society's collective wisdom is being tested to determine how to strike this balance.
I hope this column helps deepen your understanding of the discussions surrounding AI and copyright.