Abstract
OBJECTIVE We analyze data sets consisting of pedigrees with age at onset of colorectal cancer (CRC) as phenotype. The occurrence of familial clusters of CRC suggests the existence of a latent, inheritable risk factor. We aimed to compute the probability of a family possessing this risk factor as well as the hazard rate increase for these risk factor carriers. Due to the inheritability of this risk factor, the estimation necessitates a costly marginalization of the likelihood. METHODS We propose an improved EM algorithm by applying factor graphs and the sum-product algorithm in the E-step. This reduces the computational complexity from exponential to linear in the number of family members. RESULTS Our algorithm is as precise as a direct likelihood maximization in a simulation study and a real family study on CRC risk. For 250 simulated families of size 19 and 21, the runtime of our algorithm is faster by a factor of 4 and 29, respectively. On the largest family (23 members) in the real data, our algorithm is 6 times faster. CONCLUSION We introduce a flexible and runtime-efficient tool for statistical inference in biomedical event data with latent variables that opens the door for advanced analyses of pedigree data.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Medizin > Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie |
Themengebiete: | 600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit |
URN: | urn:nbn:de:bvb:19-epub-41242-8 |
ISSN: | 1423-0062 |
Allianz-/Nationallizenz: | Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich. |
Sprache: | Englisch |
Dokumenten ID: | 41242 |
Datum der Veröffentlichung auf Open Access LMU: | 15. Nov. 2017, 13:55 |
Letzte Änderungen: | 04. Nov. 2020, 13:17 |