Use of synthetic data to improve breast cancer detection, characterization and treatment response prediction


Breast cancer (BC) is the most prevalent cancer among women in Europe, comprising over 30% of all newly diagnosed cancers. Globally, over 2.3 million new cases and 670,000 deaths occur annually, with incidence expected to rise by 2050. Early detection through breast imaging is vital, yet access disparities and data limitations restrict progress. The SYNTHIA breast cancer use case focuses on creating synthetic data: advanced breast tissue and lesion modelling, treatment response prediction models, and synthetic control arms in clinical trials, enabling innovation in diagnosis, treatment allocation, and outcome prediction while ensuring privacy. 


The Challenge 

Despite advances in imaging and treatment, early detection and prediction of relapse remain difficult. There is limited access to large, high-quality, multi-modal datasets linking imaging, clinical, genomic, and pathological information. Privacy, ethical, and interoperability issues further restrict data sharing. Conventional clinical trials often require large control arms that are difficult to assemble, especially in metastatic breast cancer. 


Our Research Questions 

  1. Can synthetic imaging data improve breast cancer detection/characterization and improve performance of models predicting local breast cancer relapse or second breast cancer primaries and/or distant relapse?  
  2. Can synthetic clinical data reproduce treatment effectiveness from distinct real-world and clinical trial cohorts, in terms of comparable outcomes with ET plus CDK inhibitors in metastatic HR+/HER2- breast cancer? 

Our Approach 

  • Create breast tissue and lesion models using advanced AI generative methods (GANN, stable diffusion) for downstream imaging tasks (automated detection, motion synthesis, harmonization) and for informing upstream imaging system development. 
  • Validate multimodal AI models for predicting treatment response and relapse, comparing real vs. synthetic data, and assessing prognostic gain, bias, and generalizability.  
  • Generate synthetic data to fortify external control arms in clinical trials evaluating new therapies for breast cancer, reducing reliance on randomized standard-of-care arms. 

The main data modalities include mammography, MR, clinical & epidemiological data, ultrasound, radiomics, genomics, pathology. 


Envisioned Impact 

The breast cancer use case is expected to demonstrate the value of synthetic data by significantly enhancing accuracy and efficiency in automated image detection for breast cancer diagnosis, staging, and follow-up. It will also improve clinical decision-making for personalized treatment allocation through validated multimodal AI models that incorporate both real and synthetic data, ensuring diagnostic and prognostic gains, bias assessment, and generalizability. Furthermore, the generation of synthetic control arms for clinical trials will improve trial design and outcomes, strengthening decision-making for regulators, Health Technology Assessments, healthcare providers, and patients. 


Use Case Leadership

 
Academic Lead:
Rodrigo Dietsmann
Vall d'Hebron Institute of Oncology (VHIO)


Industry Lead:

Laurence Vancamberg 
GE Healthcare