A new clustering approach in Web People Search

Supervisor: Dra. R. Martínez

Resolving the ambiguity of person names in web search results is a challenging problem becoming an area of interest for NLP and IR communities.
This task can be seen as a clustering problem where each cluster corresponds to an unique individual and must contains all the documents referring that individual. Thus, the challenge of this task is estimating the number of different individuals (i.e. the number of clusters) and grouping the web pages of the same individual in the same cluster. The estimation of the number of different individuals has been addressed mainly by means of supervised techniques.
In this talk, we present a new clustering approach for this problem, which estimates the number of the different individuals by means of a data-driven method based only on the information of the documents that have to be compared. We have carried out an study of the different aspects involved in the clustering task: different document representations, weighting functions,similarity measures and thresholds computed from input data. Finally, we also present how social media platforms affects to this task and a heuristic method that treats in a special way this kind of web pages."

Agustín Delgado director de Innovación y Sostenibilidad, IBERDROLA, IEEE-HKN