Logo Logo
Switch Language to German
Hornung, Roman; Wright, Marvin N. (20. December 2018): Block Forests: random forests for blocks of clinical and omics covariate data. Department of Statistics: Technical Reports, No.219


In the last years more and more multi-omics data are becoming available, that is, data featuring measurements of several types of omics data for each patient. While using multi-omics data as covariate data in outcome prediction is promising, it is also challenging due to the complex structure of such data. Random forest is a prediction method known for its ability to render complex dependency patterns between the outcome and the covariates. Against this background we developed five candidate random forest variants tailored to multi-omics covariate data. These variants modify the split point selection of random forest to incorporate the block structure of multi-omics data and can be applied to any outcome type for which a random forest variant exists, such as categorical, continuous and survival outcomes. Using 20 multi-omics data sets with survival outcome we compared the prediction performances of the block forest variants, using random survival forest as a reference method. We also considered the common special case of having clinical covariates and measurements of a single omics data type available. We identify one variant termed "block forest" that performed significantly better than standard random survival forest (adjusted p-value: 0.027). The two best performing variants have in common that the block choice is randomized in the split point selection procedure. In the case of having clinical covariates and a single omics data type available, the improvements of the variants over random survival forest were larger than in the case of the multi-omics data. In the former case four of the five variants performed significantly better than random survival forest. The degrees of improvements over random survival forest varied strongly across data sets. The new prediction method block forest for multi-omics data can significantly improve the prediction performance of random forest. Block forest is particularly effective for the special case of using clinical covariates in combination with measurements of a single omics data type.