Abstract
The paper deals with the scenario where some covariates are observed by design for a subset of the observations only. In the example treated in the paper this occurs with a two phase sampling scheme where in the first phase a relatively large sample is drawn to record a response variableYand a set of (cheap) covariatesx. In a second phase a smaller sample is drawn from the first phase sample where additional (usually expensive) covariateszare also recorded. The second phase can be drawn with unequal probability sampling, where the sampling weights depend on the observedYandx. The overall intention is to fit a regression model ofYon both,xandz. Due to the design of the data collection we are faced with missing values forzfor a majority of observations. We propose an approximate estimation approach using semi-parametric mean and variance regression ofYonxonly and augment this fit with a full regression model ofYonxandz. The idea extends the approach of Little (1992) towards non-normal data and non-linear models. The proposed estimation is numerically rather simple and performs convincingly well in simulation studies compared to alternatives such as complete-case and multiple imputation analysis.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Mathematik, Informatik und Statistik > Statistik |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 510 Mathematik |
ISSN: | 0932-5026 |
Sprache: | Englisch |
Dokumenten ID: | 88862 |
Datum der Veröffentlichung auf Open Access LMU: | 25. Jan. 2022, 09:28 |
Letzte Änderungen: | 25. Jan. 2022, 09:28 |