Overview of BioCreative II gene normalization

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Morgan, Alexander A.; Lu, Zhiyong; Wang, Xinglong; Cohen, Aaron M.; Fluck, Juliane; Ruch, Patrick; Divoli, Anna; Fundel, Katrin; Leaman, Robert; Hakenberg, Joerg; Sun, Chengjie; Liu, Heng-hui; Torres, Rafael; Krauthammer, Michael; Lau, William W.; Liu, Hongfang; Hsu, Chun-Nan; Schuemie, Martijn; Cohen, K. Bretonnel und Hirschman, Lynette (2008): Overview of BioCreative II gene normalization. In: Genome Biology 9:S3 [PDF, 377kB]

Vorschau

Download (377kB)

DOI: 10.1186/gb-2008-9-S2-S3

Abstract

Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes ( often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. Results: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A `maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. Conclusion: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.

Dokumententyp:	Zeitschriftenartikel
Publikationsform:	Publisher's Version
Fakultät:	Mathematik, Informatik und Statistik > Informatik
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme 500 Naturwissenschaften und Mathematik > 570 Biowissenschaften; Biologie
URN:	urn:nbn:de:bvb:19-epub-23682-2
ISSN:	1474-760X
Sprache:	Englisch
Dokumenten ID:	23682
Datum der Veröffentlichung auf Open Access LMU:	06. Mrz. 2015, 11:18
Letzte Änderungen:	04. Nov. 2020, 13:05

Dokument bearbeiten