30 March 2010
“The Language of the Gene Ontology”.
2:30pm - 3:30pm
Dr Robert Stevens - University of Manchester
In this talk I will report on recent work using techniques from computational linguistics to model communication via annotations of data using terms supplied bhy an ontology. We consider the annotation task as a form of communication in which a speaker (the annotator) is attempting to convey a message using tokens from a vocabulary (the gene ontology) to a listener (the wider biological community). Ontological annotation of data appears to share many of the statistical features associated with natural language. Many data sets analysed so far have obeyed Zipf’s law, the power law behaviour observed in natural language. In addition, the power law exponent appears to have implication for the quality of the annotations analysed.
In this work we have used the gene ontology (GO) that supplies a controlled vocabulary that is extensively used to describe genes and gene products. Querying and analysis of data can depend on such annotations; thus the quality of those annotations is important.
The power law exponents were consistently different between the three GO sub-ontologies in the annotation corpora, providing insights into how annotators use the sub-ontologies. On filtering the corpora using indicators of confidence, we also found that the value of the power law exponent responded in a predictable way to changes in data quality. We therefore suggest that these techniques from computational linguistics methodologies can provide both novel and important insights into the annotation process and a novel quality metric for corpora described using the gene ontology and other ontologies.
Biography: Robert Stevens is a senior lecturer in the BioHealth Informatics Group. He has a background in biochemistry, biological computation, HCI and bioinformatics. His main research interests lie in the development and use of description logic based ontologies to describe and analyse biological data.
Save to your Calendar