Use of synthetic data for treatment response prediction using Minimal Residual Disease in Diffuse large B-cell non- Hodgkin lymphoma (DLBCL)
Diffuse Large B-Cell Lymphoma (DLBCL) is the most common non-Hodgkin lymphoma, accounting for 30–40% of cases. Despite effective first-line immunochemotherapy, many patients relapse or develop refractory disease. Clinical trials are challenged by the disease’s rarity, heterogeneity, and high costs. The DLBCL use case focuses on synthetic data to strengthen external control arms for clinical trials of new therapies and to develop prognostic models based on Minimal Residual Disease (MRD), a key indicator of survival and treatment response.
The Challenge
Traditional clinical trials in DLBCL, especially in relapsed or refractory settings, often require randomized control arms that are difficult to establish and may not align with standard-of-care practices. In addition, there is a need to develop and validate prognostic models tailored to DLBCL patients, with a focus on Minimal Residual Disease (MRD) as a pivotal indicator for forecasting overall survival and treatment response.
Our Research Questions
- Using identical inclusion and exclusion criteria of a completed trial in DLBCL, can we create a synthetic control arm with similar outcomes?
- Can SD measure outcome and identify novel prognostic biomarkers?
- Can SD for pre-training models improve segmentation and identification of lesions?
Our Approach
- Generate synthetic data to strengthen control groups in clinical trials, validated against retrospective real-world data.
- Employ synthetic data for the development and validation of prognostic models focusing on MRD, integrating molecular biology, PCR, Next-Generation Sequencing, and PET imaging.
- Use synthetic data to expand DLBCL cohorts and correct for treatment-related variability in MRD significance.
The main data modalities: MRD data, clinical and lab results, demographics, treatment lines, PET-CT imaging.
Envisioned Impact
The diffuse large B-cell lymphoma use case is expected to demonstrate the value of synthetic data by improving both the design and results of clinical trials and by enabling the development of more accurate prognostic models. The expected direct impacts are twofold: firstly, to enhance lesion detection, monitoring of disease progression, and assessment of treatment effects through MRD-based prognostic modelling; and secondly, to highlight the utility of synthetic data in strengthening external control arms for clinical trials, advancing medical research, clinical practice, and regulatory decision-making. Ultimately, this contribution is anticipated to support more personalized treatment plans, minimize unnecessary procedures or side effects, and improve the quality of life for DLBCL patients.
Use Case Leadership
Academic Lead:
Sirpa Leppä
Helsinki University Hospital
Industry Lead:
Michel Van Speybroeck
Janssen