Logo Logo
Hilfe
Hilfe
Switch Language to English

Fütterer, Cornelia; Nalenz, Malte und Augustin, Thomas (11. November 2021): Discriminative Power Lasso -- Incorporating Discriminative Power of Genes into Regularization-Based Variable Selection. Department of Statistics: Technical Reports, Nr. 239 [PDF, 436kB]

Warnung
Es gibt eine neuere Version des Dokumentes.
[thumbnail of DPL_TR.pdf]
Vorschau
Download (436kB)

Abstract

In precision medicine, it is known that specific genes are decisive for the development of different cell types. In drug development it is therefore of high relevance to identify biomarkers that allow to distinguish cell-subtypes that are connected to a disease. The main goal is to find a sparse set of genes that can be used for prediction. For standard classification methods the high dimensionality of gene expression data poses a severe challenge. Common approaches address this problem by excluding genes during preprocessing. As an alternative, L1-regularized regression (Lasso) can be used in order to identify the most impactful genes. We argue to use an adaptive penalization scheme, based on the biological insight that decisive genes are expressed differently among the cell types. The differences in gene expression are measured as their discriminitive power (DP), which is based on the univariate compactness within classes and separation between classes. ANOVA based measures, as well as measures coming from clustering theory, are applied to construct the covariate specific DP. The resulting model, that we call Discriminative Power Lasso (DP-Lasso), incorporates the DP as covariate specific penalization into the Lasso. Genes with a higher DP are penalized less heavily and have a higher chance for being part of the final model. With that the model can be guided towards more promising and trustworthy genes, while the coefficients of uninformative genes can be shrunken to zero more reliably. We test our method on single-cell RNA-sequencing data as well as on simulated data. DP-Lasso leads on average to significantly sparser solutions compared to competing Lasso-based regularization approaches, while being competitive in terms of accuracy.

Alle Versionen dieses Dokumentes

Dokument bearbeiten Dokument bearbeiten