4CAT Capture & Analysis Toolkit

Stijn Peeters

doi:10.5281/zenodo.4742622

4CAT Capture & Analysis Toolkit

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms. Its goal is to make the capture and analysis of data from these platforms accessible to people through a web interface, without requiring any programming or web scraping skills.

mentions

contributors

Get started

4458 commitsLast commit ≈ 1 week ago402 stars75 forks

Cite this software

Software version:

DOI:

10.5281/zenodo.4742622

Choose a reference manager format:

Description

4CAT: Capture and Analysis Toolkit

In 4CAT, you create a dataset from a given platform according to a given set of parameters; the result of this (usually a CSV or JSON file containing matching items) can then be downloaded or analysed further with a suite of analytical 'processors', which range from simple frequency charts to more advanced analyses such as the generation and visualisation of word embedding models.

4CAT has a (growing) number of supported data sources corresponding to popular platforms that are part of the tool, but you can also add additional data sources using 4CAT's Python API. The following data sources are currently supported actively and can be used to collect data with 4CAT directly:

4chan and 8kun
Bluesky
Telegram
TikTok (from a list of TikTok post URLs)
Tumblr

The following platforms are supported through Zeeschuimer, with which you can collect data to import into 4CAT for analysis:

9gag
Douyin
Gab
Imgur
Instagram (posts)
LinkedIn
Pinterest
Threads
Truth.social
TikTok (posts and comments)
X/Twitter
Xiaohangshu

It is also possible to upload data collected with other tools as CSV files, or zip archives of media files (i.e. video, images, and audio). The following tools are explicitly supported but other data can also be uploaded as long as it is formatted as CSV or uses a common media file format:

Facebook and Instagram (via CrowdTangle or Facepager exports)
YouTube videos and comments (via the YouTube Data Tools)
Weibo (via Bazhuayu)

A number of other platforms have built-in support that is untested, or requires e.g. special API access. You can view the data sources in our wiki or review the data sources' code in the GitHub repository.

Installation

You can install 4CAT locally or on a server via Docker or manually. For easiest installation, we recommend copying our docker-compose.yml file, .env file, and running this terminal command in the folder where those files have been saved:

docker-compose up -d

If you are developing 4CAT or running it from a source checkout rather than a release, see docker/README.md for the docker-compose_build.yml (build the image locally) and docker-compose_dev.yml (live-reload source edits) variants.

In depth instructions on both Docker installation and manual installation can be found in our wiki. A video walkthrough installing 4CAT via Docker can be found on YouTube here.

Currently scraping of 4chan, 8chan, and 8kun require additional steps; please see the wiki.

Please check our issues and create one if you experience any problems (pull requests are also very welcome).

Upgrading 4CAT

Instructions on upgrading 4CAT from previous versions can be found in our wiki.

Modules

4CAT is a modular tool and easy to extend. The following two folders in the repository are of interest for this:

datasources: Data source definitions. This is a set of configuration options, database definitions and python scripts to process this data with. If you want to set up your own data sources, refer to the wiki.
processors: A collection of data processing scripts that can plug into 4CAT to manipulate or process datasets created with 4CAT. There is an API you can use to make your own processors.

Credits & License

4CAT was created at OILab and the Digital Methods Initiative at the University of Amsterdam. The tool was inspired by DMI-TCAT, a tool with comparable functionality that can be used to scrape and analyse Twitter data.

4CAT development is supported by the Dutch PDI-SSH foundation through the CAT4SMR project.

4CAT is licensed under the Mozilla Public License, 2.0. Refer to the LICENSE file for more information.

Keywords

Programming languages

Python 68%
JavaScript 24%
HTML 6%
CSS 2%

License

MPL-2.0

</>Source code

Participating organisations

Reference papers

1.
Author(s): Stijn Peeters, Sal Hagen, Dale Wahl
Published by Zenodo in 2025
10.5281/zenodo.4742622

1.
Author(s): Stijn Peeters, Sal Hagen
Published in Computational Communication Research by Amsterdam University Press in 2022, page: 571-589
10.5117/ccr2022.2.007.hage

Mentions

1.
Author(s): Guido Anselmi, Alessandro Caliandro, Alessandro Gandini, Lucia Bainotti
Published by Amsterdam University Press in 2024
10.1515/9789048555109

1.
Author(s): Elena Aversa
Published in IASDR 2023: Life-Changing Design by Design Research Society in 2023
10.21606/iasdr.2023.786

Contributors

Contact person

Stijn Peeters

Universiteit van Amsterdam

0000-0002-0161-8019 Mail Stijn

Stijn Peeters

Universiteit van Amsterdam

0000-0002-0161-8019

Sal Hagen

University of Amsterdam

0000-0002-1669-2377

Dale Wahl

University of Amsterdam

0000-0002-7324-9048

Related software

Zeeschuimer

Zeeschuimer is a browser extension that monitors traffic while browsing social media, and collects seen posts/items for later systematic analysis. Its target audience is researchers who study content on social media platforms that resist conventional scraping or API-based data collection.

Updated 26 months ago

4CAT Capture & Analysis Toolkit

Cite this software

DOI:

Description

4CAT: Capture and Analysis Toolkit

Installation

Upgrading 4CAT

Modules

Credits & License

Participating organisations

Reference papers

Computer programs1

Journal articles1

Mentions

Books1

Book section3

Conference papers1

Journal articles46

Other5

Contributors

Contact person

Stijn Peeters

Related software

Zeeschuimer