UMLS-EDA

🎉 A light-weighted UMLS-based data augmentation for biomedical NLP tasks including Named Entity Recognition and sentence classification 🎉

Citation: Kang, T., Perotte, A., Tang, Y., Ta, C., & Weng, C. (2020). UMLS-based data augmentation for natural language processing of clinical research literature. Journal of the American Medical Informatics Association.
Author: Tian Kang (tk2624@cumc.columbia.edu)
Affiliation: Department of Biomedical Informatics, Columbia Univerisity (Dr. Chunhua Weng's lab)
Built upon EDA-Easy Data Augmentation

User Guide

0. Before start

Install 'UMLS' and 'QuickUMLS' locally
Get your UMLS SOAP API Key from the UTS ‘My Profile’ area after signing in UMLS Terminology service
Add your API Key and QuickUMLS directory to the config.py.
Costomzie other variables in the config.py

1. Named Entity Recognition

Input: CoNLL format file
Usage:

    python augment4ner.py [-h] --input INPUT [--output OUTPUT] [--num_aug NUM_AUG] [--alpha ALPHA]

2. Sentence Classification

Input: "|" seperated file (index|label|sentence text)
Usage:

    python augment4class.py [-h] --input INPUT [--output OUTPUT] [--num_aug NUM_AUG] [--alpha ALPHA]

See examples/example4ner.conll and example/example4class.txt

Reference

Wei, J. and Zou, K., 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196. (Github repo)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMLS-EDA

User Guide

0. Before start

1. Named Entity Recognition

2. Sentence Classification

Reference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
example		example
src		src
LICENSE		LICENSE
QuickUMLS		QuickUMLS
README.md		README.md
augment4class.py		augment4class.py
augment4ner.py		augment4ner.py
config.py		config.py

License

WengLab-InformaticsResearch/UMLS-EDA

Folders and files

Latest commit

History

Repository files navigation

UMLS-EDA

User Guide

0. Before start

1. Named Entity Recognition

2. Sentence Classification

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages