Ctrl K

Democracy Scraping Tool

A scraping tool to scrape various sources of country reports.

2
contributors

Description

scrape-tool

install

# Download sources
git clone https://github.com/backdem/scrape-tool.git
cd scrape-tool
# Optional: create a virtual environment
python -m venv .env
# Install dependencies
pip install -r requirements.txt

eu_rul_of_law

python src/main_eu_rule_of_law.py --configfile src/config.json --outputfolder ./data/sources/eu-rule-of-law/raw-csv/ --overwrite

or for 2021 reports

python src/main_eu_rule_of_law.py --configfile src/config_eu_rule_of_law_2021.json --overwrite --xlsx --outputfolder ./data/sources/eu-rule-of-law/raw-csv/2021/

the --xlsx will also output .xlsx format alongside the csv file. Assuming reports will remain standardized in the future, to process future reports modify/create ./src/config_eu_rule_of_law_2021.json and change the baseURL and docIds so that the baseURL + docId provides the URL for a report.

download GRECO pdf files

python src/download_greco_pdfs.py --outputfolder data/sources/greco/raw-pdf

parse GRECO pdf files

python src/main_greco.py --outputfolder data/sources/greco/raw-csv/ --inputfolder data/sources/greco/raw-pdf/ --overwrite

download and parse freedom house reports

python src/main_freedomhouse.py --configfile src/config.json --outputfolder ./data/sources/freedomhouse/raw-csv/ --overwrite

parse BTI rtf report files

python src/main_bti.py --datafolder ./data/sources/bti/raw-rtf/ --outputfolder ./data/sources/bti/raw-csv/ --overwrite

download freedomhouse complete book archives

python src/download_freedomhouse_archive_pdfs.py --outputfolder ./data/source/freedomhouse/raw-pdf/

generate dataset

python src/generate_dataset.py --rootfolder data/sources/ --outputfilename ./all_countries_0.0.4.csv
Keywords
scraping
Programming language
License
</>Source code

Participating organisations

Social Sciences & Humanities
Social Sciences & Humanities
Netherlands eScience Center
Erasmus University Rotterdam

Contributors

Contact person

Reggie Cushing
Reggie Cushing
Reggie Cushing
AZ
Asya Zhelyazkova

Related projects

BackDem

Assessing democratic backsliding in European and its neighborhood

Updated 8 months ago
Finished