Use of SD for predictive models for diagnosis based on imaging and fluid biomarkers and the evaluation of the potential for synthetic control arms in Alzheimer Disease (AD)


Alzheimer’s disease (AD) is the leading cause of dementia, with prevalence rising as populations age, posing major societal and economic challenges. Early detection during preclinical and prodromal stages is vital but remains difficult due to disease heterogeneity and limited sensitive diagnostics. The SYNTHIA Alzheimer’s disease use case investigates how synthetic data can improve predictive models for diagnosis, support external control arms for clinical trials, and augment imaging data for brain-age modelling, enabling earlier and more precise interventions while protecting patient privacy. 


The Challenge 

Alzheimer’s disease involves progressive brain lesions and neuronal loss, often beginning years before symptoms appear. Reliable early diagnosis and differentiation between cognitive stages such as subjective cognitive decline, mild cognitive impairment, and dementia is difficult. Biomarker tests are costly or invasive, imaging requires advanced interpretation, and real-world data is limited by privacy and accessibility.  


Our Research Questions 

  1. How does the combination of synthetic data and real data improve the performance of AI/ML models in predicting diagnostic stages of Alzheimer's disease?
  2. Can synthetic data generated from longitudinal cohort studies be used to construct realistic control arms?
  3. Can real MRI data be augmented with SD improve a predictive model of brain age? 

Our Approach 

  • Evaluate AI/ML models trained on SD, RD, and hybrid datasets to discriminate between diagnostic stages (cognitively normal, MCI, dementia).
  • Investigate how far synthetic data generated from several longitudinal cohort studies can be used to construct control arms for real clinical trials in the AD field.
  • Embed specific pathologies into MR/PET data, enhance radiomic feature extraction, and integrate imaging with methylation data to estimate brain age. 

The main data modalities: genetics (APOE), fluid biomarkers (beta-amyloid, pTau), neuropsychological tests (CDR, MMSE, RBANS, FAQ), imaging (MRI, CT, PET), genomics. 


Envisioned Impact 

The Alzheimer’s disease use case is expected to demonstrate the value of synthetic data by enhancing early and specific diagnosis, allowing differentiation of disease stages at earlier phases and improving patient outcomes and care management. It will explore the possible use of synthetic cohorts as control arms in future clinical trials, reducing reliance on placebo groups, expediting drug development, and accelerating patient access to new therapies. Additionally, augmenting imaging data with synthetic data will improve lesion detection, monitoring of disease progression, and assessment of treatment effects. Together, these applications are expected to support better diagnostic accuracy, faster trial designs, and more personalised treatment strategies in Alzheimer’s disease. 


Use Case Leadership

 

Academic Lead:
Holger Fröhlich
Fraunhofer

 

Industry Lead:
Matt Clement
Gates Ventures