Co-Occurrence Graphs for Cross-Lingual Word Sense Disambiguation

Supervisor: Dra. L.Araujo

Cross-Lingual Word Sense Disambiguation (CLWSD) aims to determine the most suitable translation for a given word from a source language to a target one. This is a particular case of the Word Sense Disambiguation (WSD) problem. CLWSD tries to deal with some of the difficulties of WSD, such as the scarcity of sense inventories and sense tagged corpora, by taking advantage of the shared meaning between parallel texts. Our unsupervised approach comprises the automatic generation of bilingual dictionaries, and a new technique for the construction of a co-occurrence graph used to select the most suitable translations from the dictionary. Different disambiguation techniques that make use of the co-occurrence graph as source of information have been tested, as well as different target languages, being English the source language. The evaluation has been conducted using datasets from tasks of the SemEval 2010 and SemEval 2013 competitions, and our system has been compared with those unsupervised systems participating in the same tasks, achieving significant improvements.

Andrés Duque Fernández