Abstract
The prediction of the values of ordinal response variables using covariate data is a relatively infrequent task in many application areas. Accordingly, ordinal response variables have gained comparably little attention in the literature on statistical prediction modeling. The random forest method is one of the strongest prediction methods for binary response variables and continuous response variables. Its basic, tree-based concept has led to several extensions including prediction methods for other types of response variables.
In this paper, the ordinal forest method is introduced, a random forest based prediction method for ordinal response variables. Ordinal forests allow prediction using both low-dimensional and high-dimensional covariate data and can additionally be used to rank covariates with respect to their importance for prediction.
Using several real datasets and simulated data, the performance of ordinal forests with respect to prediction and covariate importance ranking is compared to competing approaches. First, these investigations reveal that ordinal forests tend to outperform competitors in terms of prediction performance. Second, it is seen that the covariate importance measure currently used by ordinal forest discriminates influential covariates from noise covariates at least similarly well as the measures used by competitors. In an additional investigation using simulated data, several further important properties of the OF algorithm are studied.
The rationale underlying ordinal forests to use optimized score values in place of the class values of the ordinal response variable is in principle applicable to any regression method beyond random forests for continuous outcome that is considered in the ordinal forest method.
Dokumententyp: | Paper |
---|---|
Keywords: | prediction; ordinal response variables; covariate importance ranking; random forest |
Fakultät: | Mathematik, Informatik und Statistik > Statistik > Technische Reports |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 500 Naturwissenschaften |
URN: | urn:nbn:de:bvb:19-epub-41183-0 |
Sprache: | Englisch |
Dokumenten ID: | 41183 |
Datum der Veröffentlichung auf Open Access LMU: | 24. Okt. 2017, 05:55 |
Letzte Änderungen: | 04. Nov. 2020, 13:17 |
Literaturliste: | Ben-David, A. (2008). Comparison of classification accuracy using Cohen’s Weighted Kappa. Expert Systems with Applications, 34, 825–832. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Breiman, L., Friedman, J. H., Olshen, R. A., and Ston, C. J. (1984). Classification and Regression Trees. Wadsworth International Group, Monterey, CA. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. Cohen, J. (1968). Weighed kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220. Hornung, R. (2017). ordinalForest: Ordinal Forests: Prediction and Variable Ranking with Ordinal Target Variables. R package version 2.1. Hothorn, T., Hornik, K., and Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15, 651–674. Jakobsson, U. and Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427–431. Janitza, S., Tutz, G., and Boulesteix, A.-L. (2016). Random forest for ordinal responses: prediction and variable selection. Computational Statistics and Data Analysis, 96, 57–73. Probst, P., Bischl, B., and Boulesteix, A.-L. (in prep.). Tunability and importance of hyperparameters of machine learning algorithms. Wright, M. N. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17. |