Abstract
The inclusion of high-dimensional covariate data in prediction models has become a well-studied topic in the last decades. Although most of these methods do not account for possibly different types of variables in the set of covariates available in the same dataset, there are many such scenarios where the covariates can be structured in blocks of different types. To date, there exist a few computationally intensive approaches that make use of block structures of this kind. In this paper we present priority-Lasso, an intuitive and practical analysis strategy for building prediction models based on Lasso that takes such block structures into account. It requires the definition of a priority order of blocks of data. Lasso models are calculated successively for every block and the fitted values of every step are included as an offset in the fit of the next step. We apply priority-Lasso with different settings on a dataset of acute myeloid leukemia (AML) consisting of clinical variables, cytogenetics, gene mutations and expression variables, and compare its performance on an independent validation dataset to standard Lasso models. The results show that priority-Lasso is able to keep pace with Lasso in terms of prediction accuracy. Variables of blocks with higher priorities are favored over variables of blocks with lower priority, which results in an easily useable and transportable model for clinical practice.
Dokumententyp: | Paper |
---|---|
Keywords: | Cox Regression; Lasso; Multi-Omics Data; Penalized Regression; Prediction Model; priority-Lasso |
Fakultät: | Mathematik, Informatik und Statistik > Statistik > Technische Reports |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 500 Naturwissenschaften |
URN: | urn:nbn:de:bvb:19-epub-41305-8 |
Sprache: | Englisch |
Dokumenten ID: | 41305 |
Datum der Veröffentlichung auf Open Access LMU: | 24. Nov. 2017, 07:08 |
Letzte Änderungen: | 04. Nov. 2020, 13:17 |