Drug named entity recognition

A lightweight Python library for finding drug names in a string, otherwise known as named entity recognition (NER) and named entity linking. This library finds only high confidence drugs and doesn't support misspellings at present.

1
contributor
Get started
142 commitsLast commit ≈ 2 months ago19 stars7 forks

Cite this software

What Drug named entity recognition can do for you

💊 Drug named entity recognition

Developed by Fast Data Science, https://fastdatascience.com

Source code at https://github.com/fastdatascience/drug_named_entity_recognition

Tutorial at https://fastdatascience.com/drug-named-entity-recognition-python-library/

This is a lightweight Python library for finding drug names in a string, otherwise known as named entity recognition (NER) and named entity linking.

Please note this library finds only high confidence drugs and doesn't support misspellings at present.

It also only finds the English names of these drugs. Names in other languages are not supported.

It also doesn't find short code names of drugs, such as abbreviations commonly used in medicine, such as "Ceph" for "Cephradin" - as these are highly ambiguous.

💻Installing drug named entity recognition Python package

You can install from PyPI.

pip install drug-named-entity-recognition

If you get an error installing, try making a new Python environment in Conda (conda create -n test-env; conda activate test-env) or Venv (python -m testenv; source testenv/bin/activate / testenv\Scripts\activate) and then installing the library.

The library already contains the drug names so if you don't need to update the dictionary, then you should not have to run any of the download scripts.

If you have problems installing, try our Google Colab walkthrough.

💡Usage examples

You must first tokenise your input text using a tokeniser of your choice (NLTK, spaCy, etc).

You pass a list of strings to the find_drugs function.

Example 1

from drug_named_entity_recognition import find_drugs

find_drugs("i bought some Prednisone".split(" "))

outputs a list of tuples.

[({'name': 'Prednisone', 'synonyms': {'Sone', 'Sterapred', 'Deltasone', 'Panafcort', 'Prednidib', 'Cortan', 'Rectodelt', 'Prednisone', 'Cutason', 'Meticorten', 'Panasol', 'Enkortolon', 'Ultracorten', 'Decortin', 'Orasone', 'Winpred', 'Dehydrocortisone', 'Dacortin', 'Cortancyl', 'Encorton', 'Encortone', 'Decortisyl', 'Kortancyl', 'Pronisone', 'Prednisona', 'Predniment', 'Prednisonum', 'Rayos'}, 'medline_plus_id': 'a601102', 'mesh_id': 'D018931', 'drugbank_id': 'DB00635'}, 3, 3)]

You can ignore case with:

find_drugs("i bought some prednisone".split(" "), is_ignore_case=True)
Keywords
Programming languages
  • Python 52%
  • Jupyter Notebook 48%
License
</>Source code
Packages
pypi.org

Participating organisations

Fast Data Science Ltd

Contributors

TW
Thomas A Wood