Abstract
The paper describes the scenario of a survey where a relatively large random sample is drawn at a first phase and a response variable Y and a set of (cheap) covariates x are observed, while (usually expensive) covariates z are missing. In a second phase, a smaller random sample is drawn from the first phase sample where the additional covariates z are also recorded. The overall intention is to fit a regression model of y on both, x and z. The question tackled in this paper is how to select the second phase random sample. We assume further that the survey is drawn repeatedly over time, that is data on Y , x and z are available from previous studies. As example for such setting we consider rental guide surveys, regularly run in German cities. We propose to draw the second phase sample such that it minimizes the estimation variability in the underlying regression model. This step is carried out with imputation using the previous survey data. The norm of matrix can be used to find simulation based second phase sample which maximize design matrix of imputed data. The proposed sampling scheme is numerically rather simple and performs convincingly well in simulation studies as well as in the real data example.
Item Type: | Other |
---|---|
Keywords: | Two phase sampling; Repeated survey; Rental guide survey; Matrix norms |
Faculties: | Mathematics, Computer Science and Statistics > Statistics > Technical Reports |
Subjects: | 300 Social sciences > 310 Statistics |
URN: | urn:nbn:de:bvb:19-epub-74729-8 |
Language: | English |
Item ID: | 74729 |
Date Deposited: | 12. Jan 2021, 08:10 |
Last Modified: | 12. Jan 2021, 08:10 |