Ryuya Itano , Honoka Tanitsu , Motoki Bamba , Ryota Noseyama, Akihito Kohiga and Takahiro Koita , Doshisha University , Japan
Crowdsourcing assumes a transient relationship between task requesters and workers, which makes it hard for workers to improve their skills. In addition, with the emergence of AI, crowd work is shifting from simple tasks to more complex and open-ended ones, highlighting the importance of training workers to handle such tasks. Although various methods have been proposed to train and evaluate workers, a method to evaluate them in open-ended tasks among workers has not yet been established. In this study, we propose applying a hierarchical inter-worker evaluation structure based on workers’ skill levels to the evaluation of open-ended tasks, and examine how closely it aligns with the requesters’ subjective evaluation criteria. The experimental results showed that evaluations from workers were highly aligned with those from requesters in terms of relative worker rankings. However, the alignment was weaker for absolute scores, due to workers’ tendency toward generous scoring. These findings are expected to be utilized in future research to enhance worker engagement and retention rates.
Crowdsourcing, Worker training, Worker evaluation, Amazon Mechanical Turk