HOLM

Hyperparameter Optimization to accelerate active Learning Models

Shutterstock 2054482877

With the emergence of online publishing, the number of scientific papers on any topic is skyrocketing. To summarize text data, researchers write systematic reviews. To achieve this, they must screen thousands of studies for inclusion in their overview. The process of finding the rare relevant papers is error-prone and extremely time intensive. ASReview implemented active learning to accelerate locating the relevant papers and can save up to 95% screening time.

The goal of this project is to increase the performance of active learning for screening large amounts of textual data by optimizing the hyperparameters of learning algorithms in the ASReview open-source software. Users from social sciences should be able to select a set of hyperparameters optimized for textual data from their domain instead of the currently implemented values obtained from medical datasets.

To provide an example, in 2020, at Utrecht University, researchers screened 392,437 abstracts, of which only ~2% were relevant (source: https://asreview.nl/blog/project/systematic-reviews-uu-umc/). Assuming 40 abstracts per hour, researchers were screening abstracts 9,812 hours. Even if we take the lower performance of ASReview and assume only two researchers screened for relevance, >10,000 hours could have been saved. If we can optimize the model performance even with only a few percent, we can save an enormous amount of work worldwide (and tax money).

To develop a plug-in for the overarching software suite ASReview allowing users to select domain-specific hyperparameters. It should include documentation, vignettes, and instruction materials for less-experienced users.

Participating organisations

Social Sciences & Humanities

Related software

asreview-simulation

AS

Command line interface to simulate an ASReview analysis using a variety of prior sampling strategies, classifiers, feature extractors, queriers, balancers, and stopping rules, all of which can be configured to run with custom parameterizations.

Updated 14 months ago

2

HOLM

Participating organisations

Output

Team

Contact person

Pablo Lopez-Tarifa

Programme Manager

Netherlands eScience Center

0000-0002-4136-1860

Related software

asreview-simulation

HOLM

Participating organisations

Output

Other1

Team

Contact person

Pablo Lopez-Tarifa

Programme Manager

Netherlands eScience Center

.logo-orcid_svg__st1{fill:#fff}0000-0002-4136-1860

Related software

asreview-simulation

0000-0002-4136-1860