Improvements of an Unsupervised Approach for Person Namen Disambiguation in Web Results

Advisors: Dra. R.Martínez, Dra. S. Montalvo

Person Name Disambiguation (PND) in web results returned by a search engine is a challenging problem becoming an area of interest for NLP and IR communities. The goal is estimating how many different individuals are referred in web search results and classifying those web pages according to the individual they refer to. The state of the art methods have addressed this problem mainly by means of training data in order to learn a fixed threshold which determines the number of the different individuals. However, these systems need enough and representative training data to guarantee consistent results for different data collections, which requires a huge human effort. At Doctoral Consortium 2014, we presented and unsupervised approach which does not need the use of training data in order to solve this problem. However, we have detected that this method has several limitations. Therefore, we have developed an extended approach to overcome the limitations found, which also improves the results obtained by our initial algorithm.

Agustín Daniel Delgado Muñoz