SYNTHIA Insight 2: Scoping the Landscape: How Are Privacy and Utility Measured in Medical Synthetic Data?

01 May 2025

Welcome to SYNTHIA Insight - a series of focused content pieces that bring the science behind SYNTHIA to life. Through interviews with our project partners, we explore views, visions and expertise in synthetic data. Each edition offers an accessible window into the objectives of SYNTHIA and the progress of our work - helping to engage the wider community, spark dialogue, and promote understanding of SYNTHIA’s mission and impact. We invite you to connect with the minds shaping the future of synthetic data.

In SYNTHIA Insight nr. 2 we introduce Jean-Louis Raisaro, Assistant Professor at the Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois (CHUV), alongside his team members Bogdan Kulynych and Bayrem Kaabachi. Raisaro discussed the motivations and findings of the SYNTHIA publication in the Nature journal NPJ | Digital Medicine titled: A scoping review of privacy and utility metrics in medical synthetic data.

The study addresses a central challenge in the field: the lack of consistent and reliable standards for evaluating synthetic data. “Synthetic data holds great promise in facilitating data sharing in healthcare, especially when real patient data is sensitive,” said Raisaro. “But to foster trust and adoption, we need robust and agreed-upon methods to evaluate it—both in terms of utility and privacy.”

Synthetic Data: Opportunity Meets Complexity

The researchers explain that synthetic data—data artificially generated to resemble real-world data—can be created in two main ways: through expert-defined simulations or by training statistical or AI models on real datasets. Their paper focuses on the latter. While synthetic data can accelerate research, improve model development, and reduce privacy risks, it’s not inherently safe. A common misconception is that because the data is "fake," it is private. However, without proper safeguards, synthetic datasets can still leak sensitive information. “Many assume synthetic data automatically protects privacy, but that’s a dangerous myth,” explained Kulynych. “Privacy evaluation is technically demanding and often overlooked, leading to potential harm if these datasets are used in clinical settings without rigorous checks.”

Findings: A Fragmented Landscape

The CHUV team, in collaboration with the Berlin Health Institute at Charité, systematically reviewed literature from the past five years. Their aim was to assess whether there is a consensus on how to evaluate synthetic data. Their conclusion: the field is fragmented and lacks standardization. “Some metrics have no clear operational meaning, making it nearly impossible to compare generators or select the best one for a particular use case,” said Kaabachi. “That’s a huge obstacle, especially in high-stakes fields like medicine.”

The study introduces a four-dimensional taxonomy for evaluation: Broad utility (fidelity), Narrow utility (task-specific), Fairness and Privacy. Most studies evaluated only one or two of these dimensions, neglecting the holistic perspective needed for clinical use. The team urges the community to adopt more rigorous and comprehensive evaluation frameworks. Among their key recommendations:

Evaluate across all four dimensions.
Avoid conflicting metrics that blur the lines between privacy and utility.
Use state-of-the-art methods to simulate potential privacy attacks.
Incorporate differential privacy techniques where possible, which offer formal guarantees of protection.

Recent advancements now make it feasible to generate useful and provably private synthetic data. SYNTHIA can play a crucial role in bridging this gap. The researchers emphasize that synthetic data isn’t a silver bullet. Institutions still need strong governance and safeguards before deploying such data in medical research or decision-making. “We believe SYNTHIA is uniquely positioned to lead the community in establishing best practices,” Raisaro concluded. “By providing standardized frameworks and sharing validated tools, we can build justifiable trust in synthetic data for healthcare.”

Watch the video interview:

Listen to the podcast:

Click here to open our podcast on Spotify >

For the full publication:

Click here to open in the NPJ | Digital Medicine >