Back

David Morse

Dr. David Morse E-Mail: david.morse @ open.ac.uk
Tel: +44 (0) 1908 858463



Research

My primary research interest is in information extraction, particularly information extraction from the legacy scientific literature. My research focus has and remains to be biodiversity informatics so I have been investigating the problems and challenges surrounding information extraction from the old biodiversity literature.

In the past five years I have won four grants, worth approximately 590,000 in this research area. Two of these grants have been funded by JISC, the other two by the EU, so I am experienced in working on nationally and on internationally funded research projects.

The research challenges that are being investigated by these projects include:

Big data - our source data is the biodiversity literature. The legacy, print, literature in this domain has been estimated to run to 300 million pages.

Noisy data - Optical Character Recognition (OCR) errors introduced during the scanning process means that up to two thirds of named entities (e.g. scientific names) are spelt incorrectly; simple spell checking or look up against an authority is not sufficient to address this problem. For example 'Homo', the genus name for humans, can be mis-interpreted by an OCR engine as the butterfly genus 'Homa', so the context of use is very important.

Disambiguation - taxonomic nomenclature calls for unique names only within Kingdoms, hence there is a bacteria genus 'Bacillus' and an insect genus 'Bacillus'.

Domain specific terminology - the domain makes extensive use of terse language, abbreviations and special characters such as male ♂ and female ♀, and mixes Latin formal descriptions with vernacular text.

The four grants that David has won recently are:

A Community-driven Curation Process for Taxonomic Databases. JISC Digital Infrastructure Programme: Managing Research Data call, for 85,902.

A data infrastructure to support agricultural scientific communities. Promoting data sharing and development of trust in agricultural sciences. EU Seventh Framework Programme , Capacities – Research Infrastructures. Principal investigator at the OU. The total budget is €4 million, with the OU share being 222,745.

Virtual Biodiversity Research and Access Network for Taxonomy. EU Seventh Framework Programme , Capacities – Research Infrastructures. Principal investigator at the OU and Workpackage leader. The total budget is €4.75 million, with the OU share being 207,685.

Automatic Biodiversity Literature Enhancement. JISC Digitisation Programme: Enhancing Digital Resources call, for 73,261.


View by: