Logo Logo
Switch Language to German
Drießlein, David; Küchenhoff, Helmut ORCID: 0000-0002-6372-2487; Tutz, Gerhard; Wippert, Pia Maria (1. September 2017): Variable Selection and Inference in a follow-up Study on Back Pain. Department of Statistics: Technical Reports, No.211


The Lasso of Tibshirani (1996) is a useful method for estimation and implicit selection of predictors in a linear regression model, by using a 1-penalty, if the number of observations is not markedly larger than the number of possible predictors. We apply the Lasso to a predictive linear regression model in a study with baseline and follow up measurement for unspecific low back pain with a focus on the selection of psycho sociological predictors. Practitioners want to report measures of uncertainty for estimated regression coefficients, i.e. p-values or confidence intervals, where post selection classical t-tests are not valid anymore. In the last few years several approaches for inference in high-dimensional data settings have been developed. We do a selective overview on assigning p-values to Lasso selected variables and analyse two methods in a simulation study using the structure of our data set. We found out that Multi Sample Splitting (Wasserman and Roeder, 2009; Meinshausen et al., 2009) may not be helpful for generating p-values, while the LDPE approach of Zhang and Zhang (2014) produces promising results for type-I-errors and power calculations on single hypotheses. Therefore, we apply the LDPE for the analysis of our back pain study.