Natural Language Processing

We specialise in exploiting surface clues, term distribution and co-occurrence patterns, to leverage high level information, for instance on semantic similarity or semantic relations between concepts. We extend NLP techniques to other types of representation, such as ontologies and diagrams. We apply approaches to term distribution and bootstrapping resources for less studies languages, such as Arabic, and in dataset profiling, or the investigation of the dependency between NLP technique performance and corpus characteristics.

Contact Prof. Anne De Roeck

NLP for e-Learning

Natural Language Processing offers powerful tools that can enhance the learning and teaching experience, in particular for assessment and feedback generation. Our research extends the reach of mainstream NLP approaches, by deploying them in, and adapting them to e-learning applications. We are interested also in enhancing e-learning platforms and virtual learning environments with NLP technology, and evaluating how users experience the added value. The theme ties in closely with work in Human Centred Computing, and Software Engineering, and is closely associated with the BCS Grand Challenge.

We are working on three projects in this area.

  • Diagrams in Automatic Assessment Identifying Conceptual Gaps in Assessment LT4eL
  • Validation of NLP-enhanced Learning Environments
  • Semantic Similarity in automatic assessment


Representing the meaning of text

The relationship between text and a representation of its meaning is many to many: one text may potentially be interpreted in different ways, and different texts can have the same meaning. We are interested in the theoretical properties of text that explain this relationship, and how humans perceive ambiguity in text. We are also developing techniques and formalisms to improve the interoperability between different semantic representations, and are applying the work to tasks in recognising semantically similar text, and advanced techniques for populating ontologies from domain texts.

  • Identifying Domain Specific Ontological Relations from Text
  • Modelling Nocuous Ambiguity


Less studied languages

Computational Linguistic models, techniques and resources have been developed in the main against the background of English, and a few dozen other languages. We are interested in bootstrapping resources for languages which have not received the same amount of attention. Our projects have made several practical resources available. The research also has an important theoretical angle. Many less studied languages do not conform to the assumptions made by mainstream techniques, and require the development of radically different computational approaches.

  • Bootstrapping Resources for Less-studied Languages
  • Arabic Language Processing Nelralec: Nepali Linguistic Resources


Term dependency and dataset profiling

Words do not occur independently of each other. One fundamental difference between Natural Language Processing and Information Retrieval lies in the use of techniques that explore and exploit this fact. We are interested in developing fine grained approaches to term distribution modeling that add value to document representations, over and above simple, frequency based methods, such as the “bag of words” approach. We have experimented with some different techniques, including term distribution networks, and burstiness modeling, and we are using these in different practical settings, such as information filtering, and also dataset profiling, where we showed that term distribution measures can highlight significant difference between standard corpora.

This topic has a thematic link to the work on ambiguity detection, and Dawei Song’s work on contextual inference in retrieval (MMIR). Dataset Profiling

  • Nootropia: Adaptive Multi-topic Information Filtering Term Burstiness Contact Prof. Anne De Roeck