Requires the installation of PyMedExt_core PyMedExt_core
It can be done using requirements.txt
pip install -r requirements.txt
Installation via pip:
pip install git+git://github.com/equipe22/pymedext_eds.git@master#egg=pymedext_eds
Cloning the repository:
git clone https://github.com/equipe22/pymedext_eds.git
cd pymedext_eds
pip install .
All the annotators are defined in the pymedext_eds.annotators module. You will find a description of the existing annotators in the next section.
- First, import the annotators and text :
from pymedext_eds.utils import rawtext_loader
from pymedext_eds.annotators import Endlines, SentenceTokenizer, \
RegexMatcher, Pipeline
from pymedext_eds.viz import display_annotations
- Load documents:
data_path = pkg_resources.resource_filename('pymedext_eds', 'data/demo')
file_list = glob(data_path + '/*.txt')
docs = [rawtext_loader(x) for x in file_list]
- Declare the pipeline:
endlines = Endlines(['raw_text'], 'endlines', 'endlines:v1')
sentences = SentenceTokenizer(['endlines'], 'sentence', 'sentenceTokenizer:v1')
regex = RegexMatcher(['endlines','syntagme'], 'regex', 'RegexMatcher:v1', 'list_regexp.json')
pipeline = Pipeline(pipeline = [endlines, sentences, regex])
- Use the pipeline to annotate:
annotated_docs = pipeline.annotate(docs)
- Explore annotations by type :
from pprint import pprint
pprint(annotated_docs[0].get_annotations('regex')[10].to_dict())
- Display annotations in text (using displacy)
display_annotations(chunk[0], ['regex'])
- Endlines:
- Used to clean the text when using text extracted from PDFs. Removes erroneous endlines introduced by pdf to text conversion.
- input : raw_text
- output: Annotations
- SectionSplitter:
- Segments the text into sections
- output: Annotations
- SentenceTokenizer:
- Tokenize the text in sentences
- input: cleaned text from Endlines or sections
- output: Annotations
- Hypothesis:
- Classification of sentences regarding the degree of certainty
- input: sentences
- output: Attributes
- ATCDFamille:
- Classification of sentences regarding the subject (patient or family)
- input: sentences
- output: Attributes
- SyntagmeTokenizer:
- Segmentation of sentences into syntagms
- input: sentences
- output: Annotations
- Negation:
- Classification of syntagms according to the polarity
- input: syntagm
- output: Attributes
- RegexMatcher:
- Extracts informations using predefined regexs
- input: sentence or syntagm
- output: Annotations
- QuickUMLSAnnotator:
- Extracts medical concepts from UMLS using QuickUMLS
- output: Annotations
- MedicationAnnotator:
- Extracts medications informations using a deep learning pipeline
- output: Annotations
QuickUMLS installation (copied from Georgetown-IR-Lab/QuickUMLS)
Installation
- Obtain a UMLS installation This tool requires you to have a valid UMLS installation on disk. To install UMLS, you must first obtain a license from the National Library of Medicine; then you should download all UMLS files from this page; finally, you can install UMLS using the MetamorphoSys tool as explained in this guide. The installation can be removed once the system has been initialized.
- Install QuickUMLS: You can do so by either running
pip install quickuml
s orpython setup.py install
. On macOS, using anaconda is strongly recommended†. - Create a QuickUMLS installation Initialize the system by running
python -m quickumls.install <umls_installation_path> <destination_path>
, where<umls_installation_path>
is where the installation files are (in particular, we needMRCONSO.RRF
andMRSTY.RRF
) and<destination_path>
is the directory where the QuickUmls data files should be installed. This process will take between 5 and 30 minutes depending how fast the CPU and the drive where UMLS and QuickUMLS files are stored are (on a system with a Intel i7 6700K CPU and a 7200 RPM hard drive, initialization takes 8.5 minutes).
python -m quickumls.install
supports the following optional arguments:
- -L / --lowercase: if used, all concept terms are folded to lowercase before being processed. This option typically increases recall, but it might reduce precision;
- -U / --normalize-unicode: if used, expressions with non-ASCII characters are converted to the closest combination of ASCII characters.
- -E / --language: Specify the language to consider for UMLS concepts; by default, English is used. For a complete list of languages, please see this table provided by NLM.
- -d / --database-backend: Specify which database backend to use for QuickUMLS. The two options are leveldb and unqlite. The latter supports multi-process reading and has better unicode compatibility, and it used as default for all new 1.4 installations; the former is still used as default when instantiating a QuickUMLS client. More info about differences between the two databases and migration info are available here.
†: If the installation fails on macOS when using Anaconda, install leveldb first by running conda install -c conda-forge python-leveldb
.
import flask
from flask import Flask, render_template, request
from pymedext_eds.annotators import Endlines, SentenceTokenizer, Hypothesis, \
ATCDFamille, SyntagmeTokenizer, Negation, RegexMatcher, \
Pipeline
endlines = Endlines(['raw_text'], 'endlines', 'endlines:v1')
sentences = SentenceTokenizer(['endlines'], 'sentence', 'sentenceTokenizer:v1')
hypothesis = Hypothesis(['sentence'], 'hypothesis', 'hypothesis:v1')
family = ATCDFamille(['sentence'], 'context', 'ATCDfamily:v1')
syntagmes = SyntagmeTokenizer(['sentence'], 'syntagme', 'SyntagmeTokenizer:v1')
negation = Negation(['syntagme'], 'negation', 'Negation:v1')
regex = RegexMatcher(['endlines','syntagme'], 'regex', 'RegexMatcher:v1', 'list_regexp.json')
pipeline = Pipeline(pipeline = [endlines, sentences, hypothesis, family, syntagmes, negation, regex])
app=Flask(__name__)
@app.route('/annotate',methods = ['POST'])
def result():
if request.method == 'POST':
return pipeline.__call__(request)
if __name__ == '__main__':
app.run(port = 6666, debug=True)
Save this code in demo_flask_server.py
and run it using:
python demo_flask_server.py
import requests
from pymedextcore.document import Document
data_path = pkg_resources.resource_filename('pymedext_eds', 'data/demo')
file_list = glob(data_path + '/*.txt')
docs = [rawtext_loader(x) for x in file_list]
json_doc = [doc.to_dict() for doc in docs]
res = requests.post(f"http://127.0.0.1:6666/annotate", json = json_doc)
if res.status_code == 200:
res = res.json()['result']
docs = [Document.from_dict(doc) for doc in res ]
first create a file .git-credentials and replace user and pass by your github credentials such has
https://user:pass@github.com
WARNING :never add it on the git !!!
docker build -f eds_apps/Dockerfile_backend -t pymedext-eds:v1 .
#if proxy add
docker build -f eds_apps/Dockerfile_backend -t pymedext-eds:v1 \
--buildargs http_proxy="proxy" \
--buildargs https_proxy="proxy" .
docker run --rm -d -p 6666:6666 pymedext-eds:v1 python3 demo_flask.py