Truth Discovery in Crowdsourcing Systems


Crowdsourced data is noisy in most cases, and thus repetitive labeling is usually utilized to ensure the quality of the obtained labels. Repetitive labeling requires multiple workers to provide labels for each item. By aggregating the multiple labels, one single result, which is called consensus result, can be used as estimated true label for each item.
The AC2 data set was originally used in the work of Ipeirotis et al. “Quality management on Amazon mechanical Turk”. It includes the AMT judgments for websites for the presence of adult content on the page. The original TREC data set is used in the work of Buckley et al. “Overview of the trec 2010 relevance feedback track (notebook)”. It has AMT ordinal graded relevance judgments for pairs of search queries and URLs.

Leave a Reply

Your email address will not be published. Required fields are marked *