Generating Intelligent Descriptions of Numerical Quantities for People with Different Levels of Numeracy
NumGen investigated the communication of numerical quantities using Natural Language Generation technology, especially proportions (fractions, percentages and ratios), and whether they can easily be understood by different audiences.
Numerical quantities are extremely common in all kinds of documents. Pick up any newspaper and you will find it packed with them - "Red meat increases the risk of cancer by 67 percent" or "More than a quarter of students were awarded A grades". It is surprising, then, that in Natural Language Generation (the study of machine-generation of speech and written language) numerical quantities have received little attention beyond the decision of whether to output digits (such as 27) or number words (such as twenty-seven).
It is important to know how to express numerical information because different users have different needs. For example, not all users are numerate. In fact, a 2003 UK Government study found that nearly half of adults in the UK have problems understanding mathematical concepts such as percentages.
- Collected and annotated a corpus of texts containing numerical facts that other researchers have used
- Empirically investigated the relationship between numerical hedges (e.g., with "more than" or "around") and rounding.
- Constructed a model of proportions that simultaneously selects three features: (1) type (e.g., fraction, percentage), (2) level of precision, and (3) category of hedging phrase (e.g., greater than, less than)
- Forged collaborations with the Universidad Complutense de Madrid which lead to a summer internship at the Open University and joint publications
- Started a series of workshops at Computational Linguistics conferences on Predicting and Improving Text Readability
- Presented invited talks at the University of Aberdeen and Macquarie University in Sydney.
Click here for project publications.