Use of synthetic data for external control arm in clinical trials and to improve outcomes prediction through integration of imaging and genomic data in multiple myeloma


Multiple Myeloma (MM) is a plasma cell neoplasm and the second most prevalent hematologic cancer. It is not curable but can be controlled for years, while representing the highest treatment cost among blood cancers. The treatment landscape has improved, but progress is hampered by genetic heterogeneity and high-risk features that affect more than 25% of patients at diagnosis. The SYNTHIA multiple myeloma use case focuses on creating synthetic data to build regulatory-grade external control arms for clinical trials and on integrating genomic, clinical, demographic, and imaging variables to improve outcome prediction and advance precision medicine. 


The Challenge 

Multiple myeloma is a complex blood cancer characterized by clonal plasma cell growth in the bone marrow and significant genetic diversity. High-risk features such as cytogenetic abnormalities, circulating plasma cells, and extramedullary disease impact outcomes and complicate treatment decisions. Traditional clinical trials often require randomized control arms that are difficult to establish in later-line therapies or rare patient subgroups. Integrating imaging, molecular, and clinical data to guide therapy remains limited, creating barriers to precision medicine and timely access to effective treatments. 


Our Research Questions 

  1. How accurately can synthetic data replicate clinical outcomes observed in standard control arms of previous clinical trials for newly diagnosed multiple myeloma (NDMM)?  
  2. Can a synthetic external control arm reliably support regulatory and HTA decision-making for new MM treatments?   
  3. How effectively can synthetic data integrate genomic, clinical, demographic, and therapeutic variables? 

Our Approach 

  • Evaluate accuracy of synthetic data in replicating clinical features and endpoints observed in clinical trials and determine the suitability and robustness of synthetic control arms for regulatory-grade clinical trial assessments of new therapeutic interventions.  
  • Investigate the impact of rare high-risk genetic alterations on predicting clinical endpoints through synthetic genomic data generation.  
  • Evaluate the feasibility of employing non-invasive predictive methods, within synthetic datasets to forecast clinical outcomes and enhance the quality of life. 

The main data modalities:

  • Clinical/EHR: demographics, medical/family history, subtype, lab tests (serum/urine monoclonal component, creatinine, calcium, LDH), cytogenetics, flow cytometry, MRD, therapy details, clinical notes. 
  • Imaging: PET-CT, bone and extraosseous lesions. 
  • Genomics: FISH, DNA sequencing, CNV, ctDNA. 

Envisioned Impact 

The multiple Myeloma use case is expected to demonstrate the value of synthetic data by providing external control arms for clinical trials, particularly in settings where randomized controls are impractical or unethical. This approach will accelerate access to novel therapies while supporting decision-making by regulatory bodies, HTAs, healthcare providers, and patients. The expected direct impacts are twofold: firstly, improving lesion detection and facilitating the monitoring of lesion and disease progression, thereby enhancing the assessment of treatment effects; and secondly, highlighting the utility of synthetic data in advancing the development and applicability of predictive models, contributing to medical research, clinical practice, and regulatory evaluation. This collaborative initiative holds the promise of optimizing the design and outcomes of clinical trials and advancing treatments for multiple myeloma. 


Use Case Leadership

 

Academic Lead:
Carolina Terragna
University of Bologna

Industry Lead:
Marco DiBonaventura
Pfizer