Command line interface to simulate an ASReview analysis using a variety of prior sampling strategies, classifiers, feature extractors, queriers, balancers, and stopping rules, all of which can be configured to run with custom parameterizations.
Command line interface to simulate an ASReview analysis using a variety of prior sampling strategies, classifiers, feature extractors, queriers, balancers, and stopping rules, all of which can be configured to run with custom parameterizations.
Badge | Description |
---|---|
Persistent identifier for archived snapshots of the software | |
Linting (isort , black , and ruff , mypy , cffconvert , via pre-commit ) | |
Unit tests, mocked tests, and integration tests on combinations of operating system, ASReview version, and Python version | |
API docs https://asreview-simulation.github.io/asreview-simulation | |
Static code analysis report | |
Code coverage report | |
Link to repository state at latest GitHub release | |
How many commits there have been since the latest GitHub release |
# generate a virtual environment
python3 -m venv venv
# activate the virtual environment
source venv/bin/activate
# install asreview-simulation and its dependencies
pip install git+https://github.com/asreview-simulation/asreview-simulation.git@0.4.0
# or, if you need optional dependencies as well, e.g. 'doc2vec'
pip install asreview-simulation[doc2vec]@git+https://github.com/asreview-simulation/asreview-simulation.git@0.4.0
Print the help:
asreview simulation --help
Print the configuration:
asreview simulation print-settings
With pretty-printing:
asreview simulation print-settings --pretty
Start a simulation using the default combination of models (sam-random
,
bal-double
, clr-nb
, fex-tfidf
, qry-max
, stp-min
), each using its default
parameterization:
asreview simulation start --benchmark benchmark:van_de_Schoot_2017 --out ./project.asreview
Instead of a benchmark dataset, you can also supply your own data via the --in
option, as follows:
asreview simulation start --in ./myfile.csv --out ./project.asreview
asreview simulation start --in ./myfile.ris --out ./project.asreview
asreview simulation start --in ./myfile.tsv --out ./project.asreview
asreview simulation start --in ./myfile.xlsx --out ./project.asreview
Using a different classifier strategy can be accomplished by using one of
the clr-*
subcommands before issuing the start
subcommand, e.g.:
asreview simulation \
clr-logistic \
start --benchmark benchmark:van_de_Schoot_2017 --out ./project.asreview
Subcommands can be chained together, for example using the logistic classifier with the undersample balancer goes like this:
asreview simulation \
clr-logistic \
bal-undersample \
start --benchmark benchmark:van_de_Schoot_2017 --out ./project.asreview
Most subcommands have their own parameterization. Check the help of a
subcommand with --help
or -h
for short, e.g.:
asreview simulation clr-logistic --help
The above command will print:
Usage: asreview simulation clr-logistic [OPTIONS]
Configure the simulation to use Logistic Regression classifier.
Options:
--c FLOAT Parameter inverse to the regularization strength of
the model. [default: 1.0]
--class_weight FLOAT Class weight of the inclusions. [default: 1.0]
-f, --force Force setting the querier configuration, even if that
means overwriting a previous configuration.
-h, --help Show this message and exit.
This command is chainable with other commands. Chained commands are
evaluated left to right; make sure to end the chain with the 'start'
command, otherwise it may appear like nothing is happening.
Please report any issues at:
https://github.com/asreview-simulation/asreview-simulation/issues.
Passing parameters to a subcommand goes like this:
asreview simulation \
clr-logistic --class_weight 1.1 \
start --benchmark benchmark:van_de_Schoot_2017 --out ./project.asreview
By using individually parameterized, chained subcommands we can compose a variety of configurations, e.g.:
asreview simulation \
sam-random --n_included 10 --n_excluded 15 \
fex-tfidf --ngram_max 2 \
clr-nb --alpha 3.823 \
qry-max-random --fraction_max 0.90 --n_instances 10 \
bal-double --a 2.156 --alpha 0.95 --b 0.79 --beta 1.1 \
stp-nq --n_queries 20 \
start --benchmark benchmark:van_de_Schoot_2017 --out ./project.asreview
Chained commands are evaluated left to right; make sure to end the chain
with the start
command, otherwise it may appear like nothing is happening.
Here is the list of subcommands:
start Start the simulation
print-benchmark-names Print benchmark names
print-settings Print settings
save-settings Save settings
load-settings Load settings
sam-handpicked Handpicked prior sampler
sam-random Random prior sampler
fex-doc2vec Doc2Vec extractor
fex-embedding-idf Embedding IDF extractor
fex-embedding-lstm Embedding LSTM extractor
fex-sbert SBERT extractor
fex-tfidf TF-IDF extractor
clr-logistic Logistic Regression classifier
clr-lstm-base LSTM Base classifier
clr-lstm-pool LSTM Pool classifier
clr-nb Naive Bayes classifier
clr-nn-2-layer 2-layer Neural Net classifier
clr-rf Random Forest classifier
clr-svm Support Vector Machine classifier
qry-cluster Cluster query strategy
qry-max Max query strategy
qry-max-random Mixed query strategy (Max and Random)
qry-max-uncertainty Mixed query strategy (Max and Uncertainty)
qry-random Random query strategy
qry-uncertainty Uncertainty query strategy
bal-double Double balancer
bal-simple No balancer
bal-undersample Undersample balancer
stp-none No stopping rule
stp-nq Stop after a predefined number of queries
stp-rel Stop once all the relevant records have been found
ofn-none No objective function
ofn-wss WSS objective function
For a full overview of the API, see tests/api/test_api.py
and https://asreview-simulation.github.io/asreview-simulation. Here is an example:
import os
import tempfile
from asreviewcontrib.simulation.api import Config
from asreviewcontrib.simulation.api import OneModelConfig
from asreviewcontrib.simulation.api import prep_project_directory
from asreviewcontrib.simulation.api import run
# make a classifier model config using default parameter values given the model name
clr = OneModelConfig("clr-svm")
# make a query model config using positional arguments, and a partial params dict
qry = OneModelConfig("qry-max-random", {"fraction_max": 0.90})
# make a stopping model config using keyword arguments
stp = OneModelConfig(abbr="stp-nq", params={"n_queries": 10})
# construct an all model config from one model configs -- implicitly use default model choice
# and parameterization for models not included as argument (i.e. sam, fex, bal, ofn)
config = Config(clr=clr, qry=qry, stp=stp)
# arbitrarily pick a benchmark dataset
benchmark = "benchmark:Cohen_2006_ADHD"
# create a temporary directory and start the simulation
tmpdir = tempfile.mkdtemp(prefix="asreview-simulation.", dir=".")
output_file = f"{tmpdir}{os.sep}project.asreview"
project, as_data = prep_project_directory(benchmark=benchmark, output_file=output_file)
run(config, project, as_data)
For more examples, refer to tests/use_cases/test_use_cases.py
.
Hyperparameter Optimization to accelerate active Learning Models