Harmony

Making harmonisation simple. Social scientists often have to compare items from different questionnaires or datasets. Harmony is a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items quickly, even in different languages.

5
contributors
Get started
437 commitsLast commitΒ β‰ˆΒ 6 days ago13 stars23 forks

What Harmony can do for you

Harmony

Do you need to compare questionnaire items across studies? Do you want to find the best match for a set of items? Are there are different versions of the same questionnaire floating around and you want to make sure how compatible they are? Are the questionnaires written in different languages that you would like to compare?

Here's a walkthrough video on how you can use Harmony online at harmonydata.ac.uk. Click to view:

Harmonising questionnaires

The Harmony project is a data harmonisation project that uses Natural Language Processing to help researchers make better use of existing data from different studies by supporting them with the harmonisation of various measures and items used in different studies. Harmony is a collaboration project between Ulster University, University College London, the Universidade Federal de Santa Maria, and Fast Data Science. Harmony is funded by Wellcome as part of the Wellcome Data Prize in Mental Health.

Harmony is a project in active development and you can contribute.

If you have found a bug or would like a new feature, you can raise an issue here for issues with Harmony's natural language understanding functionality, or alternatively here for issues with Harmony's user interface and graphics. You can also join our Discord server!

What does Harmony do?

  • Psychologists and social scientists often have to match items in different questionnaires, such as "I often feel anxious" and "Feeling nervous, anxious or afraid".
  • This is called harmonisation.
  • Harmonisation is a time consuming and subjective process.
  • Going through long PDFs of questionnaires and putting the questions into Excel is no fun.
  • Enter Harmony, a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items, even in different languages.

Quick start with the code

Read our guide to contributing to Harmony here or read CONTRIBUTING.md.

You can run the walkthrough Python notebook in Google Colab with a single click:

You can also download an R markdown notebook to run in R Studio:

You can run the walkthrough R notebook in Google Colab with a single click:

The Harmony Project

Harmony is a tool using AI which allows you to compare items from questionnaires and identify similar content. You can try Harmony at https://harmonydata.ac.uk/app and you can read our blog at https://harmonydata.ac.uk/blog/.

Who to contact?

You can contact Harmony team at https://harmonydata.ac.uk/, or Thomas Wood at https://fastdatascience.com/.

πŸ–₯ Installation instructions (video)

Installing Harmony

πŸ–± Looking to try Harmony in the browser?

Visit: https://harmonydata.ac.uk/app/

You can also visit our blog at https://harmonydata.ac.uk/

βœ… You need Tika if you want to extract instruments from PDFs

Download and install Java if you don't have it already. Download and install Apache Tika and run it on your computer https://tika.apache.org/download.html

java -jar tika-server-standard-2.3.0.jar

Requirements

You need a Windows, Linux or Mac system with

  • Python 3.8 or above
  • the requirements in requirements.txt
  • Java (if you want to extract items from PDFs)
  • Apache Tika (if you want to extract items from PDFs)

πŸ–₯ Installing Harmony Python package

You can install from PyPI.

pip install harmonydata

Loading all models

Harmony uses spaCy to help with text extraction from PDFs. spaCy models can be downloaded with the following command in Python:

import harmony
harmony.download_models()

Matching example instruments

instruments = harmony.example_instruments["CES_D English"], harmony.example_instruments["GAD-7 Portuguese"]
questions, similarity, query_similarity, new_vectors_dict = harmony.match_instruments(instruments)

How to load a PDF, Excel or Word into an instrument

harmony.load_instruments_from_local_file("gad-7.pdf")

Participating organisations

University of Ulster
University College London
Universidade Federal de Santa Maria
Wellcome Trust

Testimonials

In 2023, the Australian Data Archive (ADA) i embarked on a project to harmonise a vast collection of survey questions, seeking a solution that could effectively identify and group similar items across different studies. Researchers at the ADA found Harmony, a data harmonisation tool powered by natural language processing (NLP), and the ADA recognised its potential to streamline this process. https://harmonydata.ac.uk/ada/
– Australian Data Archive (ADA)

Contributors

Related projects

Harmony Data - a platform to drive global mental health research forward

Using Natural Language processing for faster Data Harmonization and easier Data discoverability

Updated 10 months ago

Text mining in Dutch medical text

Five umc's - UMC Utrecht, Radboudumc, UMCG, ErasmusMC and AmsterdamUMC - work together to (co-)develop open source solutions, and validate our methods and techniques with each other, for the (re)use of free medical text present in our EHRs.

Updated 22 months ago

CollAIte

An Artificial Intelligence Approach to Comparing Text Versions

Updated 7 months ago
In progress

Towards next-generation scientific computing tools for diversity-aware language science and technology

Diversity-aware language technology for conversational data

Updated 7 months ago
Finished

Texcavator

Facilitating and supporting large-scale text mining in the field of digital humanities

Updated 25 months ago
Finished

Related software

Cross-perspective Topic Modeling

CR

An application that uses cross-perspective topic modeling to extract topics and opinions from text and provides insight into how they change over time.

Updated 29 months ago
10 3

DIANNA

DI

Deep Insight And Neural Network Analysis, DIANNA is the only Explainable AI, XAI library for scientists supporting Open Neural Network Exchange, ONNX - the de facto standard models format.

Updated 1 month ago
21 12

nlppln

NL

A flexible solution to build text mining workflows that allows you to quickly combine Natural Language Processing tools from different sources.

Updated 29 months ago
7 2

ShiCo

SH

A visualization that shows how the meaning we attach to a given concept shifts over time.

Updated 29 months ago
2 1

Texcavator

TE

Texcavator is a search engine and text mining application for creating word cloud and time line visualizations of large text corpora.

Updated 6 months ago
14 2

xtas

XT

the eXtensible Text Analysis Suite

Updated 29 months ago
7 2