Abstract
Nearest neighborhood classification is a flexible classification method that works under weak assumptions. The basic concept is to use the weighted or un-weighted sums over class indicators of observations in the neighborhood of the target value. Two modifications that improve the performance are considered here. Firstly, instead of using weights that are solely determined by the distances we estimate the weights by use of a logit model. By using a selection procedure like lasso or boosting the relevant nearest neighbors are automatically selected. Based on the concept of estimation and selection, in the second step, we extend the predictor space. We include nearest neighborhood counts, but also the original predictors themselves and nearest neighborhood counts that use distances in sub dimensions of the predictor space. The resulting classifiers combine the strength of nearest neighbor methods with parametric approaches and by use of sub dimensions are able to select the relevant features. Simulations and real data sets demonstrate that the method yields better misclassification rates than currently available nearest neighborhood methods and is a strong and flexible competitor in classification problems.
Item Type: | Journal article |
---|---|
Form of publication: | Publisher's Version |
Keywords: | Nearest neighborhood methods; Classification; Lasso; Boosting; Random forests; Support vector machine; Logit model |
Faculties: | Mathematics, Computer Science and Statistics > Statistics > Chairs/Working Groups > Seminar for Applied Stochastic |
Subjects: | 500 Science > 510 Mathematics |
ISSN: | 0960-3174; 1573-1375 |
Language: | English |
Item ID: | 43143 |
Date Deposited: | 12. Apr 2018, 07:35 |
Last Modified: | 04. Nov 2020, 13:18 |