Cusimano, Natalie; Stadler, Tanja; Renner, Susanne S.
A new method for handling missing species in diversification analysis applicable to randomly or non-randomly sampled phylogenies.
In: Systematic biology, Vol. 61, No. 5: pp. 785-792
Chronograms from molecular dating are increasingly being used to infer rates of diversification and their change over time. A major limitation in such analyses is incomplete species sampling that moreover is usually nonrandom. While the widely used γ statistic with the Monte Carlo constant-rates test or the birth–death likelihood analysis with the ΔAICrc test statistic are appropriate for comparing the fit of different diversification models in phylogenies with random species sampling, no objective automated method has been developed for fitting diversification models to nonrandomly sampled phylogenies. Here, we introduce a novel approach, CorSiM, which involves simulating missing splits under a constant rate birth–death model and allows the user to specify whether species sampling in the phylogeny being analyzed is random or nonrandom. The completed trees can be used in subsequent model-fitting analyses. This is fundamentally different from previous diversification rate estimation methods, which were based on null distributions derived from the incomplete trees. CorSiM is automated in an R package and can easily be applied to large data sets.We illustrate the approach in two Araceae clades, one with a random species sampling of 52% and one with a nonrandom sampling of 55%. In the latter clade, the CorSiM approach detects and quantifies an increase in diversification rate, whereas classic approaches prefer a constant rate model; in the former clade, results do not differ among methods (as indeed expected since the classic approaches are valid only for randomly sampled phylogenies). The CorSiM method greatly reduces the type I error in diversification analysis, but type II error remains a methodological problem.