
GenAI
and
Copyright.
GenAI
and
Copyright.
from
The EUIPO has published a study on the development of generative artificial intelligence (GenAI) from a copyright perspective. What impact does generative AI have on copyright law? We present the findings of the study.
The revolution through generative AI
Artificial intelligence (AI) has undergone rapid development in recent years. Generative artificial intelligence (GenAI) in particular – i.e. AI systems that independently generate content such as text, images, videos or code – is at the center of a technological turning point. Systems such as ChatGPT or image generators such as DALL-E are now an integral part of many IT products and services.
However, the rise of this technology is also accompanied by significant copyright issues. What happens when AI accesses protected content? Can companies simply use publicly available data to train their systems? And how can rights holders protect their works from unwanted use?
Following the BSI study, the EUIPO study “The Development of Generative Artificial Intelligence from a Copyright Perspective” provides a comprehensive technical and legal analysis for the EU area.
How does GenAI actually work?
GenAI systems are trained on huge amounts of data – e.g. texts, images or code. This data is analyzed using text and data mining methods (TDM) and converted into mathematical patterns (models). The AI can later generate new content from these models.
This training data often comes from the internet – but much of it is protected by copyright. This means that even if content is publicly available, it may not automatically be used for AI training.
What is allowed – and what is not
In EU law, the Directive on Copyright in the Digital Single Market(CDSM Directive) plays a central role. It permits text and data mining under certain conditions, but distinguishes between two cases:
- Article 3 CDSM: TDM for scientific research is privileged (e.g. at universities).
- Article 4 CDSM: Commercial users (e.g. companies) may only use content if the rights holders have not opted out.
Opt-out from text and data mining
Companies and rights holders can implement an opt-out from text and data mining (TDM) in accordance with Art. 4 of the EU Copyright Directive (CDSM Directive) by technically or legally excluding the use of their content. In the case of content published online, however, the exclusion must be carried out using machine-readable means. This could be done, for example, by using robots.txt files, TDM reservation protocols (TDMRep) or machine-readable metadata such as “noai”/“noindex” tags.
However, there are considerable uncertainties with regard to the effectiveness, uniformity and enforceability of such opt-outs. For example, it is unclear which technical means are considered “appropriate” within the meaning of the directive, such as whether the REP protocol alone is sufficient. In addition, crawler identities are often opaque and there is currently no binding standard for machine-readable opt-out declarations. It also remains unclear how platforms and third parties must deal with the opt-out, especially when content is aggregated or cached.
With the new AI Act, the EU also obliges providers of so-called GPAI (General Purpose AI) systems to disclose training data and ensure that generated content is recognizable as such.
Technical and legal protective measures
Today, rights holders have several options to protect their content from unwanted use by AI systems:
- Technical means: e.g. Robots Exclusion Protocol (REP), TDMRep, digital watermarks, metadata standards such as C2PA or ISCC.
- Legal measures: e.g. explicit terms of use on websites or license models with AI developers.
- Combination solutions: Many rely on a mixture of technical and legal protection strategies.
Uniform standards do not yet exist, but there is an emerging trend towards standardization and open source solutions.
Who is liable for problematic content?
The content generated by AI is also legally problematic. Some models tend to “repeat verbatim” content from training (memorization). This can lead to hidden plagiarism.
The study shows that techniques such as model unlearning, watermarking and output filters can help to minimize such risks. Some AI providers now offer contractual indemnification to protect customers from potential liability risks.
At the same time, there is a need for more transparency: the AI Act requires content to be made recognizable as AI-generated. Technical solutions such as provenance tracking and digital signatures should help here in future.
Seize opportunities, know the risks – and act with legal certainty
The EUIPO study clearly shows that GenAI is not a legal gray area, but a complex field that closely interlinks technology and copyright. Companies that use or develop AI need to understand this:
- Content from the Internet is not automatically free to use.
- Rights holders have effective options for reserving their laws.
- A clear strategy for rights clearance and licensing is essential.
- There are also legal risks for AI outputs – especially with regard to possible copyright infringements.
Anyone working in the IT sector should address these issues at an early stage and seek legal advice if necessary. Because only those who act with legal certainty can exploit the full potential of generative AI – without any nasty surprises.
We are happy to
You are welcome to
Questions about AI!
