Zahid, Faisal Maqbool and Tutz, Gerhard
Multinomial Logit Models with Implicit Variable Selection.
Department of Statistics: Technical Reports, No.89
Multinomial logit models which are most commonly used for the modeling of unordered multi-category responses are typically restricted to the use of few predictors. In the high-dimensional case maximum likelihood estimates frequently do not exist. In this paper we are developing a boosting technique called multinomBoost that performs variable selection and fits the multinomial logit model also when predictors are high-dimensional. Since in multicategory models the effect of one predictor variable is represented by several parameters one has to distinguish between variable selection and parameter selection. A special feature of the approach is that, in contrast to existing approaches, it selects variables not parameters. The method can distinguish between mandatory predictors and optional predictors. Moreover, it adapts to metric, binary, nominal and ordinal predictors. Regularization within the algorithm allows to include nominal and ordinal variables which have many categories. In the case of ordinal predictors the order information is used. The performance of the boosting technique with respect to mean squared error, prediction error and the identification of relevant variables is investigated in a simulation study. For two real life data sets the results are also compared with the Lasso approach which selects parameters.