AI & Copyright Laws - Dr. Balvinder Taneja

1. The Core Problem

AI systems (like GPT, Stable Diffusion, MidJourney, etc.) are trained on massive datasets containing books, articles, images, music, and code. Much of this training data is protected by copyright.
The legal question is:

Does using copyrighted works to train AI models infringe copyright?
Who owns the outputs generated by AI?

2. Key Legal Questions

(a) Copyright in Training Data

Argument for infringement: Using copyrighted works without permission = unauthorized copying.
Counter-argument: Training is transformative use (the AI doesn’t reproduce works directly but extracts patterns).

(b) Copyright in AI Outputs

Who is the “author”?
- If the output is fully machine-generated → many jurisdictions (e.g., US) say no copyright, since only humans can be authors.
- If a human provides prompts & creative input → some argue the user can claim copyright.
Example:
- Simple prompt (“make a cat photo”) → likely no copyright.
- Complex, detailed creative prompting → possible copyright for human prompter.

(c) Derivative Works

If an AI output is “substantially similar” to a copyrighted work in training data, it may count as an infringing derivative work.
Example: AI generates an image nearly identical to a Disney character → infringement.

3. Global Legal Perspectives

US:
- Training AI on copyrighted works → lawsuits pending (e.g., authors vs. OpenAI, artists vs. Stability AI).
- AI-generated works → not protected unless human creativity is involved.
EU:
- Text and Data Mining (TDM) exceptions allow AI training, but creators can “opt-out.”
- Stronger push for labeling AI-generated content.
India:
- Copyright Act (1957) doesn’t explicitly cover AI.
- Courts likely to follow international trends → authorship must involve human creativity.
China & Japan:
- Japan allows AI training broadly (fair use for machine learning).
- China recognizes some protection for AI works if substantial human involvement.

4. Case Studies

Getty Images vs. Stability AI (2023)
Getty sued for unauthorized use of millions of copyrighted photos to train Stable Diffusion.
US Copyright Office
Rejected copyright for a graphic novel where images were generated by MidJourney.
Thaler v. USPTO (DABUS case)
AI inventor (DABUS) patents rejected → only humans can be inventors/authors.

5. Emerging Solutions

Licensing & Compensation Models
- Pay creators when their works are used for AI training.
Transparency & Disclosure
- Companies may need to reveal datasets used for training.
Watermarking & Content Provenance
- AI-generated outputs should be labeled to prevent misuse.
AI-Specific Legal Reforms
- New copyright categories or shared authorship rules may emerge.

6. Ethical & Practical Dimensions

Creators want fair compensation if their work trains AI.
AI innovation requires open access to knowledge.
Law struggles to balance incentives for creators vs. progress in AI.

✅ Summary for Students:
AI challenges copyright laws in two main ways:

Training data use → Is it fair use or infringement?
AI output ownership → Can non-human works be copyrighted?

Right now, most jurisdictions say AI cannot own copyright; only humans can. But global lawsuits will likely shape new rules in the next 5 years.