Conferencia Plenaria: Crowdsourcing Fine-Grained Relevance Judgments

Effectiveness evaluation by means of a test collection is a standard methodology in information retrieval, with a long history. To gather relevance labels, the classical approach used in TREC-like initiatives was to use binary relevance judgments expressed by trained assessors. Two more recent trends are to rely on workers from the crowd as assessors, and to adopt multi-level relevance judgments, as well as gain-based metrics leveraging such multi-level judgment scales.
After a brief introduction to test collection based evaluation, I will report on two experiments focusing on such fine-grained relevance scales. In some recent work (ACM SIGIR 2015, ACM TOIS 2017) we proposed unbounded relevance scales by means of magnitude estimation and compared them with multi-level scales.
While magnitude estimation brings advantages like the ability for assessors to always judge the next document as having higher or lower relevance than any of the documents they have judged so far, it also comes with some drawbacks. For example, it is not a natural approach for untrained assessors to judge items as they are used to do on the Web (e.g., 5-star rating). In another more recent work (ACM SIGIR 2018) we proposed to collect relevance judgments over a 100- level relevance scale, a bounded and fine-grained scale having many of the advantages and dealing with some of the issues of magnitude estimation. The two approaches have been experimentally evaluated by means of large-scale crowdsourcing experiments, that compare the two scales with other traditional relevance scales (binary, 4-level). The results show the benefits of fine-grained scales over coarse-grained ones.

Joint work with Shane Culpepper, Gianluca Demartini, Eddy Maddalena, Kevin Roitero, Mark Sanderson, Falk Scholer, Andrew Turpin.

Stefano Mizzaro Università degli Studi di Udine


Videos de la serie ( Ver listado de videos )
Inauguración del Programa de doctorado de Sistemas Inteligentes
Conferencia: Intervenir el cerebro: Ciencia Ficción y Medicina
10 nov. 2017
Forecasting urban air quality through lstm
Phd Supervisor: José Luis Aznarte
11 jun. 2018
Emotional human robot interaction using a bci Interface.
Phd Supervisors: José Ramón Álvarez Sánchez, José Manuel Ferrándiz
11 jun. 2018
Probabilistic Forecast of NO2 levels in Madrid
Phd Supervisors: José Luis Aznarte
11 jun. 2018
Automatic categorization of electronic health Records
Phd Supervisors: Raquel Martínez Unanue & Víctor Fresno Fernández
11 jun. 2018
Automatic generation of online reputation reports
Phd Supervisors: Julio Antonio Gonzalo Arroyo, Laura Plaza Morales
11 jun. 2018
Open linked data to facilitate the mashup of Educational resources
Phd Supervisor: Miguel Rodríguez Artacho
11 jun. 2018