CrowdED is a two-staged statistical guideline for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks.
CrowdED is a two-staged statistical guideline for optimal crowdsourcing experimental design to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks.
CrowdApp Beta
To install the package, please use the pip
installation as follows:
pip install crowdED
git clone https://github.com/MaastrichtU-IDS/crowdED.git
cd crowdED
pip install --editable ./
Note: currently, crowdED is only compatible with Python 3.6.
You will need to run !pip install shortuuid
import crowded.simulate as cs
#define your parameters
total_tasks = 415
p_hard_tasks = 0.4
number_of_valid_answers = 3
#create task dataset
df_tasks = cs.Tasks(number_of_valid_answers).create(total_tasks, p_hard_tasks)
import crowded.simulate as cs
#define your parameters
total_workers = 40
alpha = 28
beta = 2
#create task dataset
df_workers = cs.Workers(alpha, beta).create(total_workers)
import crowded.simulate as cs
#workers per task should always be smaller than the number of workers
wpt = 5
#create assignment
df_tw = cs.AssignTasks(df_tasks, df_workers, wpt).create()
import crowded.method as cs
#workers per task should always be smaller than the number of workers
wpt = 5
#create assignment
df_tw = cs.AssignTasks(df_tasks, df_workers, wpt).create()
import crowded.method as cm
#define the parameters
x = df_tw['prob_task'] #vector of probabilities of tasks
y = df_tw['prob_worker'] #vector of probabilities of workers
z = df_tasks['true_answers'].unique() #vector of valid answers in the experiment
#compute probability
cp = cm.ComputeProbability(x, y, z)
import crowded.method as cm
#define the parameters
g = df_tw['true_answers'] #vector of gold standard answers
p = cp.predict() #binary vector of 0 and 1
z = df_tasks['true_answers'].unique() #vector of valid answers in the experiment
#compute match
worker_answer = cm.WorkerAnswer(g, p, z)
#add the answers to the assignment dataset
df_tw['worker_answers'] = worker_answer.match()
If you use CrowdED in a scientific publication, you are highly encouraged (not required) to cite the following paper:
CrowdED: Guideline for Optimal Crowdsourcing Experimental Design.
Amrapali Zaveri, Pedro Hernandez Serrano Manisha Desai and Michel Dumontier
https://doi.org/10.1145/3184558.3191543.
Bibtex entry:
@inproceedings{Zaveri:2018:CGO:3184558.3191543,
author = {Zaveri, Amrapali and Serrano, Pedro Hernandez and Desai, Manisha and Dumontier, Michel},
title = {CrowdED: Guideline for Optimal Crowdsourcing Experimental Design},
booktitle = {Companion Proceedings of the The Web Conference 2018},
series = {WWW '18},
year = {2018},
isbn = {978-1-4503-5640-4},
location = {Lyon, France},
pages = {1109--1116},
numpages = {8},
url = {https://doi.org/10.1145/3184558.3191543},
doi = {10.1145/3184558.3191543},
acmid = {3191543},
publisher = {International World Wide Web Conferences Steering Committee},
address = {Republic and Canton of Geneva, Switzerland},
keywords = {biomedical, crowdsourcing, data quality, data science, fair, metadata, reproducibility},
}
This repository is licensed under the terms of MIT LICENSE