- SYN-Y1-2025-014: Journal publication [Adams et al., Fraunhofer]
- The Journal of Prevention of Alzheimer's Disease, January 2026. Read here>
- The study addresses the challenge of harmonizing heterogeneous healthcare datasets, where inconsistent variable naming limits scalable multi-cohort Alzheimer's disease research. Because manual harmonization is resource-intensive, the authors assess whether modern text-embedding models can support this task. They develop a new benchmark that tests five state-of-the-art embedding models across seven Alzheimer’s disease datasets by mapping cohort metadata to a Common Data Model, using only semantic descriptions of clinical, lifestyle, demographic, and imaging variables. Results show that models performing well on general benchmarks do not necessarily excel in real-world clinical harmonization, highlighting the need for domain-specific evaluation. The authors also provide guidelines for metadata formatting and release an open-source library and interactive leaderboard to support ongoing benchmarking. The work emphasizes the importance of tailored standards to enable semi-automated clinical data harmonization.
Related Impact Highlights
Discover outputs and activities connected to this use case, highlighting the research, collaboration, and dissemination efforts driving progress in SYNTHIA.


