Logo Logo
Switch Language to German
Drießlein, David; Küchenhoff, Helmut ORCID: 0000-0002-6372-2487; Tutz, Gerhard; Wippert, Pia Maria (31. August 2017): Variable selection and inference in a follow-up study on back pain. Department of Statistics: Technical Reports, No.206


The Lasso of Tibshirani (1996) is a useful method for estimation and implicit selection of predictors in a linear regression model, by using a `1-penalty, if the number of observations is not markedly larger than the number of possible pre-dictors. We apply the Lasso to a predictive linear regression model in a study with baseline and follow up measurement for unspecific low back pain with a focus on theselection of psycho sociological predictors. Practitioners want to report measures of uncertainty for estimated regression coeÿcients, i.e. p-values or confidence intervals, where post selection classical t-tests are not valid anymore. In the last few years several approaches for inference in high-dimensional data settings have been devel-oped. We do a selective overview on assigning p-values to Lasso selected variables and analyse two methods in a simulation study using the structure of our data set. We find out that Multi Sample Splitting (Wasserman and Roeder, 2009; Meinshausen et al., 2009) may not be helpful for generating p-values, while the LDPE approach of Zhang and Zhang (2014) produces promising results for type-I-errors and power calculations on single hypotheses. Therefore, we apply the LDPE for the analysis of our back pain study.