Co-Occurrence Graphs for Multilingual Word Sense

Advisors: Dra. L. Araujo, Dr J. Martínez Romo

Cross-Lingual Word Sense Disambiguation (CLWSD) aims to determine the most suitable translation for a given word from a source language to a target one. This is a particular case of the Word Sense Disambiguation (WSD) problem. CLWSD tries to deal with some of the difficulties of WSD, such as the scarcity of sense inventories and sense tagged corpora, by taking advantage of the shared meaning between parallel texts. We are performing an extensive evaluation of different aspects of an unsupervised graph-based CLWSD system such as the selection of the bilingual dictionary, the study of different algorithms for disambiguation or the parameters involved in the graph construction. Two experimental frameworks (SemEval 2010, task 3 and SemEval 2013, task 10) are being used for evaluation in different languages, obtaining very interesting and competitive results. Another goal of this thesis consists in the application of the developed techniques to the biomedical domain. In this scope, we are working in the analysis of different medical corpora (both monolingual and multilingual), in order to determine evaluation frameworks for different tasks based on the disambiguation of words. One of the first tasks that are being addressed is the use of cross-lingual information for performing disambiguation of acronyms in this domain. Other application considered is to use those disambiguation techniques for improving the extraction of relations between rare diseases and disabilities, and between drugs and adverse drug reactions (ADRs).

Andrés Duque Fernández