KI merkt sich ganze Bücher, künstliche Intelligenz, Urheberrecht, Rechtsanwalt

AI

memorizes

from

Can a language model memorize an entire book? Is storage in AI weights already a copyright infringement? And why do the providers’ protection mechanisms fail? A new study from Stanford provides answers – with potentially far-reaching consequences for the AI sector.

When the AI knows Harry Potter by heart

Researchers at Stanford University have proven what many copyright holders feared: Large language models such as ChatGPT, Claude, Gemini and Grok have stored copyright-protected books almost word-for-word – and can reproduce them, at least in part, on request. The study published in January 2026 provides the first systematic evidence of the extent of so-called memorization in commercial AI systems.

What the researchers found out

The research team led by Ahmed Ahmed and A. Feder Cooper tested four leading AI systems: Claude 3.7 Sonnet (Anthropic), GPT-4.1 (OpenAI), Gemini 2.5 Pro (Google) and Grok 3 (xAI). The result is remarkable:

For “Harry Potter and the Philosopher’s Stone”, the researchers were able to extract 95.8 percent of the book text from Claude 3.7 Sonnet almost word for word. From Gemini 2.5 Pro it was 76.8 percent, and from Grok 3 it was 70.3 percent.

Particularly explosive: Gemini and Grok did not require any special evasion techniques. The models output the texts in response to simple continuation requests.

The methodology: simpler than expected

The researchers used a two-stage procedure:

Phase 1: You gave the AI system the first sentence of a book and asked it to continue verbatim.

Phase 2: You repeatedly asked the system to continue the text – until the entire book was extracted.

In the case of Claude and GPT-4.1, the researchers first had to bypass security mechanisms. With Gemini and Grok, the extraction worked without further ado.

The longest continuous text passages comprised up to 9,070 words – with Gemini 2.5 Pro for Harry Potter. That corresponds to several book chapters in one go.

Significance for copyright law

The study touches on key issues of copyright law and generative artificial intelligence that are currently occupying courts around the world. To date, most courts have assumed that although AI copyright works were used for training purposes, the works are not contained in the models themselves.

However, the Regional Court of Munich I has already affirmed a copyright-relevant memorization of song lyrics in the GEMA proceedings against OpenAI . The judges stated:

The song lyrics in dispute are reproducibly contained in the language models.

The Munich court makes it clear that if complete works are permanently stored in the model, this is no longer mere data analysis, but an independent exploitation relevant to copyright law.

The Stanford study now provides empirical evidence that this applies not only to song lyrics, but also to entire books. The same should also apply to other content.

Case law is developing

The legal situation is inconsistent internationally:

While the Regional Court of Munich I ruled in the GEMA proceedings that the memorization of works in AI models constitutes copyright infringement, the situation in other countries has so far mostly been different. . The judgment is not yet final.

In the US, courts have ruled in the cases of Kadrey v. Meta Platforms and Bartz v. Anthropic that AI training can generally fall under fair use. However, the plaintiffs there did not succeed in proving substantial extraction. The Stanford study could now change this.

In the UK, Getty Images also failed against Stability AI because the AI did not contain a copy. Whether the memorization from the Stanford study can be transferred to image data in the same way remains to be seen.

Conclusion

The Stanford study clearly shows that large language models memorize copyrighted books. The same is likely to apply to other texts. The study does not answer whether this can also be applied to other content such as videos, images or music.

For rights holders, this opens up new opportunities to successfully enforce copyright claims against the providers of such AI models.

The previously widespread view that AI models do not contain copyright-protected content, but have only been trained with it, can no longer be upheld after the study.

It now remains to be seen how the courts will deal with this and whether AI providers will counteract it with new models.

We are happy to

advise you about

Copyright!

Our services

Advice on non-disclosure agreement and NDA

We can advise you on all legal issues relating to NDAs and non-disclosure agreements.

Mehr erfahren

Advice on artificial intelligence

We advise you on all legal issues relating to artificial intelligence (AI). From development to training and the use of AI systems.

Mehr erfahren

GTC for e-commerce

We create, check and design customized and legally compliant GTC for your e-commerce project and advise you on all questions of GTC law.

Mehr erfahren

Advice on competition law

We advise you on all questions relating to competition law and unfair competition law, examine advertising measures and advise you on advertising measures.

Mehr erfahren

Advice on patent law

We advise you on all questions of patent law, in particular licensing and enforcement of patent claims. We work together with external patent attorneys on applications and searches.

Mehr erfahren

Successful against infringement of trade secrets

We defend your know-how and trade secrets and take action against infringements to combat them quickly and effectively.

Mehr erfahren

Relevant posts

Do you have any questions?

We are happy to help you.

Contact

Maximum file size: 10MB