Abstract
Nearest neighborhood classification is a flexible classification method that works under weak assumptions. The basic concept is to use the weighted or un-weighted sums over class indicators of observations in the neighborhood of the target value. Two modifications that improve the performance are considered here. Firstly, instead of using weights that are solely determined by the distances we estimate the weights by use of a logit model. By using a selection procedure like lasso or boosting the relevant nearest neighbors are automatically selected. Based on the concept of estimation and selection, in the second step, we extend the predictor space. We include nearest neighborhood counts, but also the original predictors themselves and nearest neighborhood counts that use distances in sub dimensions of the predictor space. The resulting classifiers combine the strength of nearest neighbor methods with parametric approaches and by use of sub dimensions are able to select the relevant features. Simulations and real data sets demonstrate that the method yields better misclassification rates than currently available nearest neighborhood methods and is a strong and flexible competitor in classification problems.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Publikationsform: | Publisher's Version |
Keywords: | Nearest neighborhood methods; Classification; Lasso; Boosting; Random forests; Support vector machine; Logit model |
Fakultät: | Mathematik, Informatik und Statistik > Statistik > Lehrstühle/Arbeitsgruppen > Seminar für angewandte Stochastik |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 510 Mathematik |
ISSN: | 0960-3174; 1573-1375 |
Sprache: | Englisch |
Dokumenten ID: | 43143 |
Datum der Veröffentlichung auf Open Access LMU: | 12. Apr. 2018, 07:35 |
Letzte Änderungen: | 04. Nov. 2020, 13:18 |