Get started
37 commitsLast commit ≈ 5 months ago0 stars0 forks
A scraping tool to scrape various sources of country reports.
# Download sources
git clone https://github.com/backdem/scrape-tool.git
cd scrape-tool
# Optional: create a virtual environment
python -m venv .env
# Install dependencies
pip install -r requirements.txt
python src/main_eu_rule_of_law.py --configfile src/config.json --outputfolder ./data/sources/eu-rule-of-law/raw-csv/ --overwrite
or for 2021 reports
python src/main_eu_rule_of_law.py --configfile src/config_eu_rule_of_law_2021.json --overwrite --xlsx --outputfolder ./data/sources/eu-rule-of-law/raw-csv/2021/
the --xlsx will also output .xlsx format alongside the csv file.
Assuming reports will remain standardized in the future, to process future reports modify/create ./src/config_eu_rule_of_law_2021.json
and change the baseURL
and docIds
so that the baseURL + docId
provides the URL for a report.
python src/download_greco_pdfs.py --outputfolder data/sources/greco/raw-pdf
python src/main_greco.py --outputfolder data/sources/greco/raw-csv/ --inputfolder data/sources/greco/raw-pdf/ --overwrite
python src/main_freedomhouse.py --configfile src/config.json --outputfolder ./data/sources/freedomhouse/raw-csv/ --overwrite
python src/main_bti.py --datafolder ./data/sources/bti/raw-rtf/ --outputfolder ./data/sources/bti/raw-csv/ --overwrite
python src/download_freedomhouse_archive_pdfs.py --outputfolder ./data/source/freedomhouse/raw-pdf/
python src/generate_dataset.py --rootfolder data/sources/ --outputfilename ./all_countries_0.0.4.csv
Assessing democratic backsliding in European and its neighborhood