Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Li, Yingxia ORCID: https://orcid.org/0000-0002-4501-5834; Herold, Tobias ORCID: https://orcid.org/0000-0002-9615-9432; Mansmann, Ulrich ORCID: https://orcid.org/0000-0002-9955-8906 und Hornung, Roman ORCID: https://orcid.org/0000-0002-6036-1495 (2024): Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study. In: BMC Medical Informatics and Decision Making, Bd. 24, 244 [PDF, 2MB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Veröffentlichte Version

DOI: 10.1186/s12911-024-02642-9

Abstract

Background : Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions. Methods : In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell’s C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives. Results : Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures. Conclusions : Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Medizin > Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie Medizin > Klinikum der LMU München > Medizinische Klinik und Poliklinik III (Onkologie)
Themengebiete:	600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit
URN:	urn:nbn:de:bvb:19-epub-123127-4
ISSN:	1472-6947
Sprache:	Englisch
Dokumenten ID:	123127
Datum der Veröffentlichung auf Open Access LMU:	13. Dez. 2024 15:17
Letzte Änderungen:	19. Dez. 2024 07:26

Dokument bearbeiten