New guidance from the Data Protection Conference on the legally compliant use and development of AI systems
Update Data Protection No. 214
With the growing importance of artificial intelligence (AI), data protection issues are also coming more into focus. Back in May 2024, the German Data Protection Conference (DSK) published its first guidance on the data protection-compliant use of AI (only in German; we reported). The guidance by the DSK (only in German) published in June 2025 now sets out for the first time specific technical and organizational measures for the protection of personal data that must be observed during the development and operation of AI systems. The focus is on seven so-called guarantee objectives: Data minimization, transparency, confidentiality, integrity, availability, intervenability and non-linking. The guideline structures the requirements along the life cycle of an AI system – from the initial concept through development and implementation to operation. In the following, we summarize the central requirements for you in a practical way and show what operators and manufacturers should consider now.
I. Design phase
In the design phase, it is determined which data is required for the planned AI system, for what purpose it will be processed and whether there is a legal basis for this. The DSK emphasizes that processing is only permissible if it is necessary and lawful. Even publicly accessible data may only be used if its origin is unambiguously lawful.
The selection of data and the system architecture must be documented. The guidance recommends the use of standardized documentation methods such as "Datasheets for Datasets". Among other things, data types, sources, collection methods and collection periods must be recorded. It must be checked at this stage whether the purpose can also be achieved with anonymized or synthetic data.
The principle of data minimization must also be observed here. This means that only data that is necessary for the intended purpose may be collected. The selection of the AI algorithm can help to limit the scope. Particularly sensitive characteristics or indirect conclusions via so-called proxy data must also be critically examined (e.g. conclusions about the origin based on the zip code).
At the same time, it must be ensured that data subjects can be informed before their data is used. To this end, sufficient time periods must be planned between collection and model training. Organizational and technical preparations must be made to ensure compliance with the rights of data subjects under data protection law and official orders.
In addition, measures must be taken to ensure the quality and integrity of the data, to detect manipulation and to prevent unwanted conclusions being drawn from the training data. Confidentiality must be safeguarded by procedures such as access restrictions or differential privacy.
II. Development phase
In the development phase, the data is prepared, the model is trained and validated. The data must be transformed in such a way that it is used for its intended purpose and in a data-efficient manner. Personal references should be removed if they are not required for the model.
The DSK requires that the selection and use of algorithms be documented. This also includes the definition of quality objectives and the development of suitable test procedures. The validation must prove the suitability of the model for the defined purpose.
The principle of data minimization also applies in this phase. Only necessary data may be used and each component of a system may only access the information required for it. In addition, the model must not learn or output any information that goes beyond the purpose.
It must be ensured that model behavior remains traceable and that results can be questioned or corrected if necessary. The development environment must also be protected against failures and data loss.
Finally, integrity and confidentiality must be ensured. Integrity includes both the correctness of the data and the robustness of the model against incorrect or manipulated inputs. Confidentiality relates in particular to the risk of sensitive content being reproduced by the model. Intermediate results with personal references must be avoided or specially protected.
III. Introduction phase
The introduction phase concerns the provision of the fully developed AI system in the production environment. It is relevant under data protection law if personal data is processed during installation or configuration.
The DSK requires that central decisions on the use of the system are documented and made available in a comprehensible form, in particular to data subjects. This involves the purpose, functionality, possible automated decisions, human influence and the existing rights of data subjects.
Data protection-friendly default settings ("privacy by default") must be taken into account as early as the software distribution stage. Models may only be delivered with the data that is necessary for their specific purpose. In the case of parametric models (e.g. neural networks), personal training data should not be distributed, but this may be technically necessary for non-parametric models.
Confidentiality plays a particularly important role when personal data is delivered together with the model. The decisive factor here is whether the model is used locally or server-based. Local processing increases the risk of data leaks, which must be taken into account when designing the implementation.
IV. Operation and monitoring
During operation, it must be ensured that the AI system is used exclusively for the intended purposes and that all relevant data protection principles are adhered to at all times.
The DSK emphasizes that model decisions must be comprehensible and verifiable. Relevant parameters and processing steps must be documented in an audit-proof manner. The system must also be checked and validated again in the event of updates or after changes.
If it becomes apparent during operation that the model is processing more data than necessary, it must be adapted or retrained. The same applies if a discriminatory effect of individual characteristics becomes apparent.
The system must be technically designed in such a way that data subject rights such as access, rectification and erasure can be implemented even after it has been put into operation. In particular, affected models or training data must also be checked in the event of a deletion request. Re-training or machine unlearning may be necessary.
In addition, the integrity of the system must be ensured through continuous quality assurance. This includes regular tests for unwanted model changes and measures against targeted attacks, such as manipulated inputs. In the case of systems that learn independently, feedback must be monitored in particular, as further processing of the input data may constitute a change of purpose.
Confidentiality also remains a key issue during operation. The DSK emphasizes that suitable protective measures must be taken against model extraction or unintentional disclosure of sensitive content, particularly in the case of publicly accessible AI systems.
V. Conclusion and checklist
The DSK's guidance makes it clear that data protection in AI systems is not a downstream checkpoint, but must be an integral part of the development process from the initial idea through to ongoing operation. It is not only compliance with the GDPR that is crucial, but also the consistent implementation of the seven guarantee objectives: Data minimization, transparency, confidentiality, integrity, availability, intervenability and non-linking.
The selection and documentation of the data used, the review of the legal basis and the handling of sensitive or difficult-to-control model behavior during operation are particularly critical. Data protection-compliant AI requires continuous planning, monitoring and the willingness to combine technical design with legal requirements.
In order to maintain an overview, it is advisable to systematically check the data protection requirements along the four life cycle phases - the following checklist provides a compact guide.
1. Design phase
- Is the purpose of the processing clearly defined and is there a suitable legal basis?
- Has it been checked whether anonymized or synthetic data is sufficient to dispense with personal data?
- Are processes in place to inform data subjects before the training and to exercise their rights?
2. Development phase
- Is only the personal data that is absolutely necessary for the training objective processed?
- Is the model safeguarded against unintended learning effects, discriminatory results and data bias?
- Have validation, test procedures and traceability of model behavior been documented?
3. Implementation phase
- Is the system configured to be data protection-friendly by default ("Privacy by Default")?
- Is the model only supplied with data that is necessary for its use?
- Is the confidentiality of sensitive information technically protected for local use?
4. Operation and monitoring
- Is the model regularly reviewed to identify undesirable developments, deviations from the intended purpose or discrimination?
- Can rights to information, deletion and correction also be technically implemented after commissioning?
- Are there protective mechanisms against model extraction, inferences about training data and unauthorized further processing?
This article was created in collaboration with our student employee Emily Bernklau.