
Data protection
in the
AI Länd.
Data protection
in the
AI Länd.
from
The use of artificial intelligence raises many data protection issues. The Baden-Württemberg State Commissioner for Data Protection has now presented a new version of a comprehensive discussion paper that is intended to provide practical guidance for companies and authorities. We summarize the most important findings on how you can use AI in compliance with data protection regulations.
Why this discussion paper is interesting for anyone who uses AI
Whether chatbots in customer service, automated personnel selection or intelligent assistance systems – artificial intelligence has long since arrived in everyday business life. But many companies are unsure: Are we allowed to use customer data to train our AI? What legal basis do we need? What about personal data from the internet?
The new discussion paper by the Baden-Württemberg State Commissioner for Data Protection and Freedom of Information, Prof. Dr. Tobias Keber, provides answers to these burning questions. It deliberately sees itself as a “living document” that reflects the current state of the discussion. The motto: “Use data – protect data”. Innovation and data protection should go hand in hand.
The key question: Does my AI system process personal data?
Before you even think about the legal basis, you need to clarify: Does your AI system process personal data? The answer is often more complicated than you think.
Personal data is all information about identifiable persons. This can be names, email addresses, but also IP addresses or location data. It gets particularly tricky with AI systems: even if an AI model does not store any data directly, it may be possible to draw conclusions about individuals through clever queries or attacks.
The discussion paper emphasizes: You must regularly check whether someone can derive personal information from your AI system. Technical attacks such as “model attacks” can attempt to reconstruct training data. What seems technically impossible today could be feasible tomorrow.
Particularly important for practice:
- Large language models (LLMs): It is controversial whether trained models themselves contain personal data. What is certain is that if users can receive personal information as output through certain inputs, this constitutes data processing.
- Training data: Even if they are “only” processed in the model after training, they can be restored under certain circumstances.
- Issues: If your AI system generates texts or images relating to real people, it is definitely processing personal data.
The five phases of AI use: you need a legal basis for each one
A common mistake: companies think they need “a legal basis for AI”. In fact, you need to check whether you are legally protected for each processing phase separately:
1. data collection for the training
Where does your training data come from? Do you collect data yourself, buy data sets or download information from the Internet? Web scraping – the automatic collection of data from public websites – is tricky in terms of data protection law. Just because something is publicly available on the internet does not mean that you are free to use it for AI training.
2. training of the AI system
This is where you process the collected data to develop or improve your system. The so-called fine-tuning – the specialization of an already trained model – also falls into this phase.
3. provision of the AI application
If you make your trained system available to others, the question arises: Is the training data processed further? Does the system continue to learn from the user’s input? This makes a big difference to the required legal basis.
4. use by users
Both you as a provider and your customers or employees as users need their own legal bases. This is known as the “double-door model“ in data protection law.
5. use of the AI results
If you personalize an AI-generated text with customer data or transfer an AI diagnosis to a patient file, new processing is created, which in turn requires a legal basis.
What legal bases are available?
The GDPR offers various ways to justify data processing. Here are the most important ones for AI applications:
Consent: Mostly impractical
However, consent as a legal basis quickly reaches its limits in AI practice:
- With large training data sets from the Internet, it is impossible to obtain consent from all data subjects.
- Complex AI systems can be so difficult to understand that truly informed consent is almost impossible.
- The right of withdrawal can be problematic: If someone withdraws their consent, you have to delete the data – possibly from a model that has already been trained.
Consent is therefore more suitable for the direct use of AI services with known users and less for training large models.
Contract fulfillment: Narrow limits
You may process data if this is necessary to fulfill a contract.
Example: A doctor uses an AI diagnostic system as part of the treatment. The processing of patient data is then part of the contractually agreed treatment.
Important restriction: The processing must really be necessary. Just because something is stated in the terms of use does not automatically make it lawful. And: You may not process data from third parties who are not party to the contract in this way.
Legitimate interests: The flexible option with an obligation to weigh up
This legal basis is of particular interest to many companies. You may process data if you have a legitimate interest and this does not outweigh the laws of the data subjects.
Legitimate interests can be:
- Development of innovative products and services
- Improving security
- Scientific research
- Combating fraud
The trade-off is complex. It must be checked:
- Is processing really necessary? Is it possible without personal data? Could you use anonymized or synthetic data?
- What are the interests of those affected? Can they expect data processing? How sensitive is the data? Does it concern particularly sensitive categories such as health data?
- What protective measures have you taken? The better you protect data through encryption, pseudonymization and other techniques, the more likely it is that your interests will prevail.
Practical examples from the discussion paper:
Large language models: The evaluation depends on whether the model is publicly accessible (open source), what social benefits it offers and whether dangerous uses are excluded. The fact that large amounts of data are used can even be advantageous because individuals “disappear” in the mass of data and are less identifiable.
Driver assistance systems: Here you have to weigh up the interest in road safety against people’s law to move around unobserved in public spaces. It is important whether you are aiming for identification or just want to recognize characteristics (“cyclist”, not “Mr Müller”).
Special rules for authorities and schools
Public bodies in Baden-Württemberg have additional legal bases in the State Data Protection Act, but are also more restricted. The very general general clause may only be used if the intensity of the interference is low.
Especially important for schools:
- AI may be used to support the individual learning pathway
- Prohibited is: Grading by AI, emotion recognition in students
- Student data must not be used to train AI systems
- Schools must ensure teachers and pupils have AI skills
Employee data protection: caution with consents
The use of AI in the HR department – for example for applicant management or performance appraisal – is tricky under data protection law. Consent from employees is problematic because the relationship of subordination calls into question the voluntary nature.
If you want to use AI in HR:
- The AI must really be suitable for the task (non-discriminatory!)
- There must be no more data protection-friendly alternative
- You must prove that your interests outweigh those of the employees
- Works council or staff council must be involved
Sensitive data: Special categories, special care
Health data, ethnic origin, political opinions and other sensitive information are subject to stricter rules. The problem: even if you do not collect such data directly, it can be derived from other data.
Example: At first glance, an AI system for creditworthiness does not process any health data. However, if it can deduce illnesses from purchasing behavior, sensitive data is still processed.
There are privileged legal bases for scientific research, but only under strict conditions:
- The research must be methodical and systematic
- There must be a cognitive goal
- The results must be verifiable
- Research must serve the common good
Practical checklist: 10 steps to data protection-compliant AI
1. clarify the phase: Which processing phase is involved – collection, training, provision, use or utilization of results?
2. check the personal reference: Is personal data really being processed? Could someone identify individuals?
3. document your technology: Which AI process do you use? How does it work?
4 Check the model itself: Does the trained model contain personal data?
5. document training data: Where does it come from? What categories does it include?
6. update your processing directory: Every company must document its data processing activities.
7. create a data protection impact assessment: This is mandatory in the case of high risk – and often the case with AI.
8. clarify responsibilities: Who is responsible under data protection law? Are there any processors?
9. find the right legal basis: for every processing phase!
10. fulfill further obligations: Inform data subjects, implement protective measures, enable data subjects to exercise their rights.
Privacy by design: thinking about data protection right from the start
The best strategy: Build data protection into your AI system from the outset, not as an add-on. This is called privacy by design.
In concrete terms, this means
- Differential privacy: techniques that “noise” training data in such a way that no conclusions can be drawn about individuals
- Federated learning: the model comes to the data, not the other way around – data remains decentralized
- Pseudonymization and encryption: standard protective measures that significantly reduce the risk
- Synthetic data: Artificially generated data without personal reference for training and tests
The better your technical protection measures, the better your position in terms of data protection justification.
Transparency: Explain what your AI does
People have a lawful right to know that their data is being processed and how it is being processed. This is a particular challenge with AI, as even experts do not always fully understand complex systems.
Nevertheless, you must provide information about:
- What does your AI do with the data?
- For what purpose?
- Who is responsible?
- What laws do data subjects have?
- Are there automated decisions?
You don’t have to explain every technical detail. But the essential aspects must be clear and understandable. If you can’t explain your own system clearly, you should ask yourself whether you should use it.
Avoid common mistakes
Mistake 1: “We have consent in the GTC” That’s not enough. Consent must be voluntary, specific and comprehensible. Hidden clauses in 20-page terms of use do not meet these requirements.
Mistake 2: “The data is public on the internet, so we are allowed to use it” Wrong. Publicly accessible data is also personal data with all the legal consequences. You must have a legal basis for processing.
Mistake 3: “We only train, that falls under research” Commercial product development is not privileged scientific research. Genuine research requires a systematic approach, verifiability and a focus on the common good.
Error 4: “The model no longer stores data” Even if training data is deleted after training, it may still be possible to reconstruct it from the model. This can still be data processing.
Error 5: “We only need one legal basis for our AI project” No, you need a separate check for each processing phase (collection, training, provision, use, exploitation of results).
What does the AI Regulation mean?
In addition to the GDPR, the AI Regulation is gradually coming into force. It introduces a risk-based system:
- Prohibited AI practices: e.g. emotion recognition in schools, social scoring
- High-risk AI: Strict requirements for documentation, testing and quality assurance
- Transparency obligations: Labeling of AI-generated content
The AI Regulation complements the GDPR. You must comply with both sets of regulations at the same time – this does not make things any easier, but it does provide a comprehensive legal framework.
Specific recommendations for action for companies
Before the start of an AI project:
Plan for data protection right from the start. The earlier you take data protection requirements into account, the cheaper and easier it will be.
Involve your data protection officer or data protection expert at an early stage, not just when the system has been fully developed. Data protection expertise helps to avoid costly mistakes.
Check alternatives to personal data. Can you achieve your goal with anonymized or synthetic data? This avoids many data protection problems.
Document all decisions. You must later be able to prove why you chose which legal basis and which protective measures you took.
During development:
Implement Privacy Enhancing Technologies. Differential privacy, encryption, pseudonymization – these are not only legal obligations, but also trust builders for customers.
Test your system for fairness. Discrimination is not only ethically problematic, but also legally risky.
Prepare understandable information for the people whose data you process.
After the launch:
Stay on the ball. Technology continues to develop, as does case law. What is compliant with data protection today may become problematic tomorrow.
Take the rights of data subjects seriously. If someone requests information or wants to delete or object to their data, do you have processes in place to implement this?
Learn from your mistakes. Data breaches can be expensive, but they are also learning opportunities.
Conclusion
The discussion paper from Baden-Württemberg shows that data protection and AI innovation do not have to contradict each other. However, it requires careful planning, technical protective measures and an honest examination of legal requirements – but it is feasible.
The paper from KI-Länd provides valuable suggestions and information on the data protection-compliant implementation of AI projects, even if not all questions have been conclusively clarified.
We are happy to
advise you about
Data protection!
