With the emergence of online publishing, the number of scientific papers on any topic is skyrocketing. To summarize text data, researchers write systematic reviews. To achieve this, they must screen thousands of studies for inclusion in their overview. The process of finding the rare relevant papers is error-prone and extremely time intensive. ASReview implemented active learning to accelerate locating the relevant papers and can save up to 95% screening time.
The goal of this project is to increase the performance of active learning for screening large amounts of textual data by optimizing the hyperparameters of learning algorithms in the ASReview open-source software. Users from social sciences should be able to select a set of hyperparameters optimized for textual data from their domain instead of the currently implemented values obtained from medical datasets.
To provide an example, in 2020, at Utrecht University, researchers screened 392,437 abstracts, of which only ~2% were relevant (source: https://asreview.nl/blog/project/systematic-reviews-uu-umc/). Assuming 40 abstracts per hour, researchers were screening abstracts 9,812 hours. Even if we take the lower performance of ASReview and assume only two researchers screened for relevance, >10,000 hours could have been saved. If we can optimize the model performance even with only a few percent, we can save an enormous amount of work worldwide (and tax money).
To develop a plug-in for the overarching software suite ASReview allowing users to select domain-specific hyperparameters. It should include documentation, vignettes, and instruction materials for less-experienced users.