Variable selection techniques after multiple imputation in high-dimensional data

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Zahid, Faisal Maqbool; Faisal, Shahla und Heumann, Christian (2019): Variable selection techniques after multiple imputation in high-dimensional data. In: Statistical Methods and Applications

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1007/s10260-019-00493-7

Abstract

High-dimensional data arise from diverse fields of scientific research. Missing values are often encountered in such data. Variable selection plays a key role in high-dimensional data analysis. Like many other statistical techniques, variable selection requires complete cases without any missing values. A variety of variable selection techniques for complete data is available, but similar techniques for the data with missing values are deficient in the literature. Multiple imputation is a popular approach to handle missing values and to get completed data. If a particular variable selection technique is applied independently on each of the multiply imputed datasets, a different model for each dataset may be the result. It is still unclear in the literature how to implement variable selection techniques on multiply imputed data. In this paper, we propose to use the magnitude of the parameter estimates of each candidate predictor across all the imputed datasets for its selection. A constraint is imposed on the sum of absolute values of these estimates to select or remove the predictor from the model. The proposed method for identifying the informative predictors is compared with other approaches in an extensive simulation study. The performance is compared on the basis of the hit rates (proportion of correctly identified informative predictors) and the false alarm rates (proportion of non-informative predictors dubbed as informative) for different numbers of imputed datasets. The proposed technique is simple and easy to implement, and performs equally well in the high-dimensional case as in the low-dimensional settings. The proposed technique is observed to be a good competitor to the existing approaches in different simulation settings. The performance of different variable selection techniques is also examined for a real dataset with missing values.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Mathematik, Informatik und Statistik > Statistik
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
ISSN:	1618-2510
Sprache:	Englisch
Dokumenten ID:	82433
Datum der Veröffentlichung auf Open Access LMU:	15. Dez. 2021 15:01
Letzte Änderungen:	15. Dez. 2021 15:01

Dokument bearbeiten