Engelhardt, Alexander; Rieger, Anna; Tresch, Achim; Mansmann, Ulrich
(19. December 2016):
Efficient Maximum Likelihood Estimation for Pedigree Data with the SumProduct Algorithm.
Department of Statistics: Technical Reports, No.200


541kB 
Abstract
In this paper, we analyze data sets consisting of pedigrees where the response is the age at onset of colorectal cancer (CRC). The occurrence of familial clusters of CRC suggests the existence of a latent, inheritable risk factor. We aimed to compute the probability of a family possessing this risk factor, as well as the hazard rate increase for these risk factor carriers. Due to the inheritability of this risk factor, the estimation necessitates a costly marginalization of the likelihood.
We therefore developed an EM algorithm by applying factor graphs and the sumproduct algorithm in the Estep, reducing the computational complexity from exponential to linear in the number of family members.
Our algorithm is as precise as a direct likelihood maximization in a simulation study and a real family study on CRC risk. For 250 simulated families of size 19 and 21, the runtime of our algorithm is faster by a factor of 4 and 29, respectively. On the largest family (23 members) in the real data, our algorithm is 6 times faster.
We introduce a flexible and runtimeefficient tool for statistical inference in biomedical event data that opens the door for advanced analyses of pedigree data.