A flexible solution to build text mining workflows that allows you to quickly combine Natural Language Processing tools from different sources.


Cite this software

What nlppln can do for you

  • Quickly build text mining and/or nlp workflows in Python
  • Combine tools written in different programming languages

Digital Humanities research often involves Natural Language Processing (NLP), in which a body of natural language text, or corpus, is analyzed using software. While there are many software packages available, constructing new research
analyses by combining (parts of) existing packages remains challenging. This is due to the fact that individual software packages are designed to do a task and to do that task well; they are not primarily designed to interact with other,
complementary packages. Another problem is that there are many tools available for English, but not for other languages.

nlppln (pronounced 'NLP pipeline') is an open source Python package that helps to address these problems, by making it easy to package existing tools in a uniform way as defined in the CWL (Common Workflow Language) standard for describing data analysis workflows. nlppln includes components to do tasks that are common in NLP, such as tokenization (multiple languages), lemmatization (for Dutch), and named entity recognition (for Dutch). These components are based on existing tools. Users can easily construct new analysis workflows by combining these pre-baked components with tools of their own creation.

Besides improving interoperability, nlppln also keeps a formal record of all steps taken in a workflow. This makes the research more transparent, and improves reproducibility.

Programming languages
  • Python 64%
  • Common Workflow Language 36%
  • Apache-2.0
</>Source code

Participating organisations

Social Sciences & Humanities
Social Sciences & Humanities
Netherlands eScience Center
University of Twente


  • 1.
    Creating Flexible and Transparent Data Processing Pipelines using Common Workflow Language
    Published in 2017
  • 2.
    A Tool for Flexible and Transparent Text Processing Pipelines
    Published in 2017
  • 3.
    Flexible NLP Pipelines for Digital Humanities Research
    Published in 2017
  • 4.
    A Standard for NLP Pipelines
    Published in 2017


Contact person

Janneke van der Zwaan

Janneke van der Zwaan

Netherlands eScience Center
Mail Janneke
Dafne van Kuppevelt
Dafne van Kuppevelt
Netherlands eScience Center
Janneke van der Zwaan
Janneke van der Zwaan
Netherlands eScience Center

Related projects

Bridging the gap

Digital humanities and the Arabic-Islamic corpus

Updated 15 months ago


Ego Documents Events modelling – how individuals recall mass violence

Updated 11 months ago

What Works When for Whom?

Advancing therapy change process research

Updated 15 months ago

Related software



Create CWL workflows by writing a simple Python script.

Updated 20 months ago
1 7