A lightweight Python library for finding drug names in a string, otherwise known as named entity recognition (NER) and named entity linking. This library finds only high confidence drugs and doesn't support misspellings at present.
Developed by Fast Data Science, https://fastdatascience.com
Source code at https://github.com/fastdatascience/drug_named_entity_recognition
Tutorial at https://fastdatascience.com/drug-named-entity-recognition-python-library/
This is a lightweight Python library for finding drug names in a string, otherwise known as named entity recognition (NER) and named entity linking.
Please note this library finds only high confidence drugs and doesn't support misspellings at present.
It also only finds the English names of these drugs. Names in other languages are not supported.
It also doesn't find short code names of drugs, such as abbreviations commonly used in medicine, such as "Ceph" for "Cephradin" - as these are highly ambiguous.
You can install from PyPI.
pip install drug-named-entity-recognition
If you get an error installing, try making a new Python environment in Conda (conda create -n test-env; conda activate test-env
) or Venv (python -m testenv; source testenv/bin/activate
/ testenv\Scripts\activate
) and then installing the library.
The library already contains the drug names so if you don't need to update the dictionary, then you should not have to run any of the download scripts.
If you have problems installing, try our Google Colab walkthrough.
You must first tokenise your input text using a tokeniser of your choice (NLTK, spaCy, etc).
You pass a list of strings to the find_drugs
function.
Example 1
from drug_named_entity_recognition import find_drugs
find_drugs("i bought some Prednisone".split(" "))
outputs a list of tuples.
[({'name': 'Prednisone', 'synonyms': {'Sone', 'Sterapred', 'Deltasone', 'Panafcort', 'Prednidib', 'Cortan', 'Rectodelt', 'Prednisone', 'Cutason', 'Meticorten', 'Panasol', 'Enkortolon', 'Ultracorten', 'Decortin', 'Orasone', 'Winpred', 'Dehydrocortisone', 'Dacortin', 'Cortancyl', 'Encorton', 'Encortone', 'Decortisyl', 'Kortancyl', 'Pronisone', 'Prednisona', 'Predniment', 'Prednisonum', 'Rayos'}, 'medline_plus_id': 'a601102', 'mesh_id': 'D018931', 'drugbank_id': 'DB00635'}, 3, 3)]
You can ignore case with:
find_drugs("i bought some prednisone".split(" "), is_ignore_case=True)