Inspired during an Odissei meeting, the lead scientist of the Netherlands Environmental Assessment Agency (PBL) wanted to use active learning to screen large amounts of textual data using the open-source software ASReview. PBL initiated an independent study and implemented ASReview at the PBL (and inspired the RIVM and the Netherlands Institute for Social Research - SCP). However, the use case of such institutes differs from the typical user because questions from the Dutch government are typically less structured than a classical systematic review question. For such purposes, ASReview allows users to choose from a wide range of models. Moreover, ASReview ships with a simulation mode to mimic the AI-aided screening process and test the performance of different models. Running such models, however, takes a long time, especially if testing the performance of neural nets, which will most likely be beneficial for users like PBL.
Therefore, the proposal aims to develop a one-click-deployment option for running large-scale simulation studies in the cloud and to evaluate how much computation time can be saved. This way, users like the PBL, can quickly run a large-scale simulation study to select the best-performing model for answering research questions from the Dutch government.
This project has resulted in ways to run simulations via the ASReview infrastructure, i.e., outside your computer, possibly in parallel. The information for running simulations on the cloud is separated in the following use cases:
- "short" simulation on SURF, Digital Ocean, AWS, Azure, etc. Use this guide if your local computer is not powerful enough, or if you need it available while the simulations run.
- Running simulations in parallel. Use this when you have a computer (local or remote) with a good amount of cores and memory, and you want to speed things up.
- Running many jobs.sh files one after the other. Use if you need to run many simulations changing parameteres, but you only have one computer. You can still parallelize the individual jobs.sh execution.
- Running large simulations using Kubernetes. Use if your simulation would take a very long time. Alternatively, if you have a powerfull enough computer and needs to control the cpu and memory usage. This is very complicated and it usually requires a lot of time to setup and money to run on a cluster.