Welcome to SYNTHIA Insight - a series of focused content pieces that bring the science behind SYNTHIA to life. Through interviews with our project partners, we explore views, visions and expertise in synthetic data. Each edition offers an accessible window into the objectives of SYNTHIA and the progress of our work - helping to engage the wider community, spark dialogue, and promote understanding of SYNTHIA’s mission and impact. We invite you to connect with the minds shaping the future of synthetic data.
In SYNTHIA Insight nr. 5 we introduce Vibeke Binz Vallevik, Principal Researcher, and Serena Elizabeth Marshall, Senior Researcher and Project Manager, from DNV. In this edition, they discuss the motivations and findings behind the SYNTHIA publication published in the International Journal of Law and Information Technology, titled: Processing of synthetic data in AI development for healthcare and the definition of personal data in EU law. In the interview, the role of DNV (independent assurance and risk management provider) in SYNTHIA is also highlighted, with a focus on risk and assurance and how synthetic data can be safely and reliably used in healthcare and regulatory contexts.
The study explores a central question for the future of AI in healthcare: when is synthetic data considered personal data, and when can it be treated as anonymous? Synthetic data is often presented as a solution to balance innovation and privacy in healthcare. However, its legal status remains complex. While synthetic data can reduce privacy risks and enable broader data sharing, uncertainty around how it is interpreted under EU law continues to slow adoption. “Synthetic data was initially seen as a solution that could remove privacy concerns,” explained Vallevik. “But in reality, there is always some residual risk, and uncertainty about how to interpret that risk is one of the key reasons why adoption has been slower than expected.”
Watch the video interview covering the highlights of the interview:
Understanding the Legal Grey Zone
At the core of the study is the question of whether synthetic data generated from personal data can still be considered personal data under the General Data Protection Regulation (GDPR). “The key question is whether synthetic data can be considered anonymous,” Vallevik explained. “And the answer is that very often it can, but it depends on the context, how the data is generated, and who has access to additional information.”
This introduces a “grey zone” in which data cannot be classified in a purely binary way. Instead, its legal status depends on practical factors such as access, context, and the likelihood of re-identification. A recent European court ruling further reinforces this perspective by shifting the focus from theoretical risk to practical reality. Rather than asking whether re-identification is technically possible, the emphasis is placed on whether it is reasonably likely. “This creates a more dynamic understanding of data,” Marshall notes. “A dataset can be considered anonymous in one context and not in another, depending on who holds it and what additional information they can access.”
From Theory to Practice: A Risk-Based Approach
One of the key contributions of the study is the proposal of a practical framework for assessing privacy risk in line with GDPR principles. Rather than focusing purely on theoretical attacks, the researchers emphasize the importance of realistic scenarios. “When evaluating privacy risk, you need to consider what information an attacker could realistically access,” Vallevik explains. “It is not meaningful to assume access to all possible data—what matters is what is reasonably likely in practice.” This approach is based on three key factors:
- Opportunity – what data or additional information is accessible
- Motivation – the value of the data and incentives for re-identification
- Effort – the time, cost, and expertise required to perform an attack
Together, these elements form a more pragmatic and legally aligned way of evaluating whether synthetic data can be considered sufficiently anonymous.
Challenging Common Misconceptions
The study also highlights two common misconceptions that shape current discussions around synthetic data. On one side, synthetic data is sometimes viewed as a complete solution to privacy challenges. On the other, it is seen as inherently unsafe due to the existence of theoretical risks. “The reality lies somewhere in between,” the researchers explain. “Synthetic data does not need to be perfectly anonymous—it needs to be anonymous enough.”
This distinction is critical. Under GDPR, the concept of anonymity is not absolute but based on what is “reasonably likely” in practice. Achieving this balance between privacy and utility is essential for enabling meaningful use of synthetic data.
Implications for Policy, Innovation, and Healthcare
Legal uncertainty around synthetic data has real-world consequences. Without clear guidance on what qualifies as “anonymous enough,” organizations often take a cautious approach, limiting data sharing and slowing innovation. “There is still a large grey zone, and that uncertainty makes institutions hesitant to share synthetic data,” Marshall notes. “Clearer thresholds and guidance from regulators would help unlock its potential.”
At the same time, this work highlights the importance of collaboration between legal experts, policymakers, and technical developers. Bridging these perspectives is essential for creating frameworks that are both legally sound and technically feasible.
Connecting to SYNTHIA’s Mission
Within SYNTHIA, addressing legal and regulatory challenges is a key component of building a trustworthy synthetic data ecosystem. This study contributes directly to that effort by linking legal interpretation with technical evaluation methods. By proposing practical frameworks and aligning them with GDPR principles, this work supports SYNTHIA’s goal of enabling safe, compliant, and scalable use of synthetic data in healthcare. Ultimately, advancing synthetic data innovation in Europe will require both technical rigor and legal clarity. As this study shows, understanding and navigating the legal grey zone is not a barrier, but a necessary step toward building trust and unlocking the full potential of data-driven healthcare.
Listen to the 30 minute podcast:
For the full publication:

