Welcome to SYNTHIA Insight - a series of focused content pieces that bring the science behind SYNTHIA to life. Through interviews with our project partners, we explore views, visions and expertise in synthetic data. Each edition offers an accessible window into the objectives of SYNTHIA and the progress of our work - helping to engage the wider community, spark dialogue, and promote understanding of SYNTHIA’s mission and impact. We invite you to connect with the minds shaping the future of synthetic data.


In SYNTHIA Insight nr. 4 we introduce Stella (Styliani-Christina) Fragkouli, Research Associate at The Centre for Research & Technology, Hellas (CERTH), and Nagat Masued, Research Engineer at the Barcelona Supercomputing Center (BSC). In this edition, they discuss the motivations and findings behind the SYNTHIA publication published in the NAR Genomics & Bioinformatics, titled: An ELIXIR scoping review on domain-specific evaluation metrics for synthetic data in life sciences.


The study addresses a fundamental question in the rapidly evolving field of synthetic data: how do we know if synthetic data is reliable? As synthetic data becomes increasingly central to life sciences research, enabling access to otherwise sensitive or scarce datasets, robust evaluation is essential to ensure scientific validity, safety, and trust.

“Synthetic data is rapidly transforming how we do research nowadays in life sciences, and it allows us to model complex biological datasets when real data are either scarce or hard to access,” explains Fragkouli. “In this study, we took a step back and asked a very simple but important question: how do we actually know if synthetic data are reliable?”


Watch the video interview:


Why Evaluation Matters: From Innovation to Trust

Synthetic data plays a growing role in enabling AI-driven biomedical research, helping overcome privacy barriers and supporting large-scale model development. However, without proper evaluation, its use can introduce significant risks.

“We are at a moment where artificial intelligence is becoming central to biomedical research, and synthetic data plays a key role in this transition because it helps us overcome privacy barriers and enable large-scale model development,” says Fragkouli. “But if synthetic data is not properly evaluated, we could end up with AI systems that may look powerful but are not reliable.”

Evaluation is therefore not just a technical step, it is essential for ensuring scientific rigor, protecting patients, and building long-term trust in AI technologies that increasingly impact real-world decisions.


Findings: A Fragmented and Context-Dependent Landscape

To better understand how synthetic data is currently evaluated, the researchers conducted a large-scale scoping review across six life science domains, screening over 8,000 records and analyzing 188 publications. Their findings reveal a fragmented landscape, with evaluation approaches varying significantly depending on the domain and use case.

“The core challenge is that there is no one-size-fits-all solution,” explains Masued. “Evaluating synthetic images is completely different from evaluating genomic sequences or electronic health records. Even the definition of what ‘good’ synthetic data looks like depends on the context and the intended use.”

The review identified 156 evaluation metrics, with 142 unique to a single domain and only a small number shared across fields. This highlights both the diversity of approaches and the lack of standardization. “Each domain has developed its own vocabulary, its own preferred metrics, and its own assumptions about what good synthetic data looks like—often without explicit justification,” adds Masued.


Risks of Poor Evaluation

The consequences of insufficient evaluation extend beyond technical limitations. Poorly assessed synthetic data can introduce bias, lead to inaccurate predictions, and undermine trust in AI systems.

“The risks are both scientific and societal,” emphasizes Fragkouli. “From a scientific point of view, poorly assessed synthetic data can lead to biased AI models, inaccurate predictions, or misleading clinical insights. From a privacy perspective, if not done properly, synthetic data may still reveal sensitive patterns from real individuals. Ultimately, this can undermine trust in AI.”

These risks reinforce the need for rigorous validation before synthetic data is used in clinical research or healthcare applications.


Towards Shared Standards and Robust Frameworks

A key contribution of the study is its role as a mapping exercise, identifying existing practices and highlighting gaps across life science domains. This work provides a foundation for developing standardized evaluation frameworks and best practices.

“Our review acts as a mapping exercise,” explains Fragkouli. “We went out to see what people are actually using and implementing in their studies, and we mapped this landscape and identified gaps. This can be seen as a first foundation on which we can start building good practices and standardized frameworks for evaluating synthetic data.”

Looking ahead, the authors emphasize the importance of defining clear guidelines and shared standards. “To fully leverage the potential of synthetic data, future efforts should focus on establishing clear evaluation guidelines and shared standards,” concludes Masued. “This will improve comparability between methods and increase confidence in using synthetic data across research disciplines.”


Connecting to SYNTHIA’s Mission

Within SYNTHIA, evaluation is a central pillar of the synthetic data generation framework. Synthetic data cannot be adopted in healthcare without clear evaluation metrics, domain-specific validation approaches, and shared standards.

This study directly contributes to that goal by mapping the current landscape and identifying the building blocks needed for robust evaluation. It supports SYNTHIA’s broader mission to develop trustworthy, transparent, and clinically relevant synthetic data solutions.

Ultimately, standardizing how we evaluate synthetic data will be key to enabling its adoption across research, clinical practice, and regulatory contexts, ensuring it can be used with confidence by all stakeholders in the healthcare ecosystem.


Listen to the podcast:

For the full publication: