In its first year, SYNTHIA set out not only to encourage dialogue and collaboration across scientific and clinical communities, but also to deliver concrete results and evidence supporting progress in synthetic data in healthcare innovation. Our scientific publications span core challenges in synthetic data generation, evaluation and application in real-world biomedical contexts. From foundational explorations of privacy and utility, to innovative models, each piece of collaborative work reflects both the span and depth of SYNTHIA’s activities with cutting-edge research questions.
All SYNTHIA publications are the result of close collaboration across our consortium, involving multiple partners.
The following SYNTHIA publications focus on one of the most critical barriers to synthetic data adoption: trust. They examine how privacy risks, data utility and evaluation metrics are currently defined, measured and interpreted in healthcare settings. By systematically reviewing existing approaches and proposing unified perspectives on re-identification, inference and reconstruction risks, these works aim to clarify how synthetic data can be assessed transparently and responsibly.
- A scoping review of privacy and utility metrics in medical synthetic data | npj Digital Medicine | Lausanne University Hospital
- An ELIXIR scoping review on domain-specific evaluation metrics for synthetic data in life sciences | Arxiv | The Centre for Research & Technology Hellas
- Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy | Arxiv | Lausanne University Hospital
- SAFE: A multimodal, scalable and clinically-oriented comprehensive framework for synthetic data validation in hematology | Blood | Humanitas Research Hospital

These SYNTHIA publications advance the technical foundations of synthetic data generation for complex biomedical data. The focus is on developing and evaluating advanced generative models capable of capturing high-dimensional, multimodal and clinically relevant data structures. These studies demonstrate how synthetic data can support demanding downstream tasks, such as image segmentation and survival prediction, while maintaining fidelity and analytical usefulness.
- SynBT: High-quality Tumor Synthesis for Breast Tumor Segmentation by 3D Diffusion Model | Arxiv | GE Healthcare
- Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks | Arxiv | Universidad Politécnica de Madrid
- Navigating Opportunities and Challenges in Synthetic DataGeneration for Biomedicine: Insights from the SYNTHIA Project | Barcelona Supercomputing Center

This SYNTHIA publication addresses the limitations of purely correlational models in healthcare AI. The featured publication focuses on causal generative modelling as a way to mitigate bias and hidden confounding, supporting more reliable inference from complex health data. This line of research strengthens the methodological robustness of both real and synthetic data applications, particularly in decision-support contexts.
- DeCaFlow: A Deconfounding Causal Generative Model | Arxiv | Universidad Politecnica de Madrid

These SYNTHIA publications capture our work at the intersection of advanced AI architectures, privacy-preserving computation and regulation. They examine how federated learning and distributed synthetic data generation can be aligned with existing medical device regulations, addressing challenges related to validation, accountability, traceability and trust in decentralized AI systems.
- Federated Learning and Medical Device Regulation: Bridging Gaps in Healthcare AI Governance | Universidad Politécnica de Madrid
- Development and validation of synthetic data generation over a federated learning computing framework to accelerate innovation and boost personalized medicine in hematological diseases | Blood | Humanitas Research Hospital

These SYNTHIA publications demonstrate how synthetic data methods and advanced analytics can be applied in concrete biomedical research scenarios. Spanning oncology and hematology, the work shows how data-driven approaches can support survival analysis, genomic benchmarking and disease understanding, even in settings where data access is constrained by sensitivity, scale or privacy concerns.
- Characterization and Clinical Implications of p53 Dysfunction in Patients With Myelodysplastic Syndromes | Journal of clinical oncology | Humanitas Research Hospital
- Synth4bench: Synthetic Data Generation for Benchmarking Tumor-Only Somatic Variant Calling Algorithms | bioRxiv | The Centre for Research & Technology Hellas

These SYNTHIA publications focus on the practical enablers of synthetic data generation and reuse. The publications address how datasets can be made more interoperable, discoverable and machine-readable through improved metadata, semantic harmonization and benchmarking approaches. Together, they contribute to making both real and synthetic data easier to integrate across studies, institutions and research domains.
- Assessment of metadata descriptors of AI-ready datasets | Scilit | Leiden University Medical Center
- A Benchmark of Large Language Models for Semantic Harmonization of Alzheimer's Disease Cohorts | The Journal of Prevention of Alzheimer's Disease | Fraunhofer


