HOLM

Hyperparameter Optimization to accelerate active Learning Models

Shutterstock 2054482877

With the emergence of online publishing, the number of scientific papers on any topic is skyrocketing. To summarize text data, researchers write systematic reviews. To achieve this, they must screen thousands of studies for inclusion in their overview. The process of finding the rare relevant papers is error-prone and extremely time intensive. ASReview implemented active learning to accelerate locating the relevant papers and can save up to 95% screening time.

The goal of this project is to increase the performance of active learning for screening large amounts of textual data by optimizing the hyperparameters of learning algorithms in the ASReview open-source software. Users from social sciences should be able to select a set of hyperparameters optimized for textual data from their domain instead of the currently implemented values obtained from medical datasets.

To provide an example, in 2020, at Utrecht University, researchers screened 392,437 abstracts, of which only ~2% were relevant (source: https://asreview.nl/blog/project/systematic-reviews-uu-umc/). Assuming 40 abstracts per hour, researchers were screening abstracts 9,812 hours. Even if we take the lower performance of ASReview and assume only two researchers screened for relevance, >10,000 hours could have been saved. If we can optimize the model performance even with only a few percent, we can save an enormous amount of work worldwide (and tax money).

To develop a plug-in for the overarching software suite ASReview allowing users to select domain-specific hyperparameters. It should include documentation, vignettes, and instruction materials for less-experienced users.

Participating organisations

Netherlands eScience Center
Utrecht University
Social Sciences & Humanities
Social Sciences & Humanities

Team

Ayoub Bagheri
Ayoub Bagheri
Anastasia Giachanou
Anastasia Giachanou
Jurriaan H. Spaaks
Jurriaan H. Spaaks
Lead-RSE
Netherlands eScience Center
Jisk Attema
Programme Manager
Netherlands eScience Center

Related software

asreview-simulation

AS

Command line interface to simulate an ASReview analysis using a variety of prior sampling strategies, classifiers, feature extractors, queriers, balancers, and stopping rules, all of which can be configured to run with custom parameterizations.

Updated 5 months ago
2