Abstract
Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes ( often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. Results: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A `maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. Conclusion: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Publikationsform: | Publisher's Version |
Fakultät: | Mathematik, Informatik und Statistik > Informatik |
Themengebiete: | 000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme
500 Naturwissenschaften und Mathematik > 570 Biowissenschaften; Biologie |
URN: | urn:nbn:de:bvb:19-epub-23682-2 |
ISSN: | 1474-760X |
Sprache: | Englisch |
Dokumenten ID: | 23682 |
Datum der Veröffentlichung auf Open Access LMU: | 06. Mrz. 2015, 11:18 |
Letzte Änderungen: | 13. Aug. 2024, 12:52 |