4CAT: Capture and Analysis Toolkit
In 4CAT, you create a dataset from a given platform according to a given set of
parameters; the result of this (usually a CSV or JSON file containing matching items)
can then be downloaded or analysed further with a suite of analytical
'processors', which range from simple frequency charts to more advanced analyses
such as the generation and visualisation of word embedding models.
4CAT has a (growing) number of supported data sources corresponding to popular
platforms that are part of the tool, but you can also add additional data
sources
using 4CAT's Python API. The following data sources are currently supported
actively and can be used to collect data with 4CAT directly:
- 4chan and 8kun
- Telegram
- Tumblr
The following platforms are supported through
Zeeschuimer, with
which you can collect data to import into 4CAT for analysis:
- Instagram (posts)
- TikTok (posts and comments)
- 9gag
- Imgur
- LinkedIn
- Gab
- Douyin
- X/Twitter
It is also possible to upload data collected with other tools as CSV files. The
following tools are explicitly supported but other data can also be uploaded as
long as it is formatted as CSV:
A number of other platforms have built-in support that is untested, or requires
e.g. special API access. You can view the data sources in our wiki or review the data
sources' code
in the GitHub repository.
Installation
You can install 4CAT locally or on a server via Docker or manually. For easiest installation, we recommend copying our docker-compose.yml file
, .env
file, and running this terminal command in the folder where those files have been saved:
docker-compose up -d
In depth instructions on both Docker installation and manual installation can be found in our
wiki. A video walkthrough installing 4CAT via Docker can be found on YouTube here.
Currently scraping of 4chan, 8chan, and 8kun require additional steps; please see the wiki.
Please check our
issues and create
one if you experience any problems (pull requests are also very welcome).
Upgrading 4CAT
Instructions on upgrading 4CAT from previous versions can be found in our wiki.
Modules
4CAT is a modular tool and easy to extend. The following two folders in the
repository are of interest for this:
datasources
: Data source definitions. This is a set of configuration
options, database definitions and python scripts to process this data with.
If you want to set up your own data sources, refer to the
wiki.
processors
: A collection of data processing scripts that can plug into
4CAT to manipulate or process datasets created with 4CAT. There is an API
you can use to make your own
processors.
Credits & License
4CAT was created at OILab and the
Digital Methods Initiative at the University
of Amsterdam. The tool was inspired by
DMI-TCAT, a tool with
comparable functionality that can be used to scrape and analyse Twitter data.
4CAT development is supported by the Dutch PDI-SSH
foundation through the CAT4SMR project.
4CAT is licensed under the Mozilla Public License, 2.0. Refer to the LICENSE
file for more information.