
Abstract
Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes ( often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. Results: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A `maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. Conclusion: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.
Item Type: | Journal article |
---|---|
Form of publication: | Publisher's Version |
Faculties: | Mathematics, Computer Science and Statistics > Computer Science |
Subjects: | 000 Computer science, information and general works > 000 Computer science, knowledge, and systems 500 Science > 570 Life sciences; biology |
URN: | urn:nbn:de:bvb:19-epub-23682-2 |
ISSN: | 1474-760X |
Language: | English |
Item ID: | 23682 |
Date Deposited: | 06. Mar 2015, 11:18 |
Last Modified: | 13. Aug 2024, 12:52 |