Logo Logo
Hilfe
Hilfe
Switch Language to English

Wünsch, Milena ORCID logoORCID: https://orcid.org/0009-0001-1982-9260; Sauer, Christina ORCID logoORCID: https://orcid.org/0000-0003-2425-7858; Callahan, Patrick ORCID logoORCID: https://orcid.org/0000-0003-1769-7580; Hinske, Ludwig Christian ORCID logoORCID: https://orcid.org/0000-0001-7273-5899 und Boulesteix, Anne‐Laure ORCID logoORCID: https://orcid.org/0000-0002-2729-0947 (2024): From RNA sequencing measurements to the final results: A practical guide to navigating the choices and uncertainties of gene set analysis. In: WIREs Computational Statistics, Bd. 16, Nr. 1 [PDF, 2MB]

Abstract

Gene set analysis (GSA), a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods have been developed for this task. However, clear guidance is lacking: choosing the right method is the first hurdle a researcher is confronted with. No less challenging than overcoming this so-called method uncertainty is the procedure of preprocessing, from knowing which steps are required to selecting a corresponding approach from the plethora of valid options to create the accepted input object (data preprocessing uncertainty), with clear guidance again being scarce. Here, we provide a practical guide through all steps required to conduct GSA, beginning with a concise overview of a selection of established methods, including Gene Set Enrichment Analysis and Database for Annotation, Visualization, and Integrated Discovery (DAVID). We thereby lay a special focus on reviewing and explaining the necessary preprocessing steps for each method under consideration (e.g., the necessity of a transformation of the RNA sequencing data)—an essential aspect that is typically paid only limited attention to in both existing reviews and applications. To raise awareness of the spectrum of uncertainties, our review is accompanied by an extensive overview of the literature on valid approaches for each step and illustrative R code demonstrating the complex analysis pipelines. It ends with a discussion and recommendations to both users and developers to ensure that the results of GSA are, despite the above-mentioned uncertainties, replicable and transparent.

Dokument bearbeiten Dokument bearbeiten