DISCLAIMER: This application is used for demonstrative and illustrative purposes only and does not constitute an offering that has gone through regulatory review. It is not intended to serve as a medical application. There is no representation as to the accuracy of the output of this application and it is presented without warranty.
This application was built to demonstrate IBM's Watson Natural Language Classifier (NLC). The data set we will be using, ICD-10-GT-AA.csv, contains a subset of ICD-10 entries. ICD-10 is the 10th revision of the International Statistical Classification of Diseases and Related Health Problems. In short, it is a medical classification list by the World Health Organization (WHO) that contains codes for: diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases. Hospitals and insurance companies alike could save time and money by leveraging Watson to properly tag the most accurate ICD-10 codes.
This application is a Python web application based on the Flask microframework, and based on earlier work done by Ryan Anderson. It uses the Watson Python SDK to create the classifier, list classifiers, and classify the input text. We also make use of the freely available ICD-10 API which, given an ICD-10 code, returns a name and description.
When the reader has completed this pattern, they will understand how to:
- Create a Natural Language Classifier (NLC) service and use it in a Python application.
- Train a NLC model using csv data.
- Deploy a web app with Flask to allow the NLC model to be queried.
- Quickly get a classification of a disease or health issue using the Natural Language Classifier trained model.
- CSV files are sent to the Natural Language Classifier service to train the model.
- The user interacts with the web app UI running either locally or in the cloud.
- The application sends the user's input to the Natural Language Classifier model to be classified.
- The information containing the classification is returned to the web app.
- Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
- Watson Natural Language Classifier: An IBM Cloud service to interpret and classify natural language with confidence.
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
- Clone the repo
- Create IBM Cloud services
- Create a Watson Studio project
- Train the NLC model
- Run the application
Clone the nlc-icd10-classifier
repo locally. In a terminal, run:
git clone https://github.com/IBM/nlc-icd10-classifier
cd nlc-icd10-classifier
Create the following service:
-
Log into IBM's Watson Studio. Once in, you'll land on the dashboard.
-
Create a new project by clicking
+ New project
and choosingData Science
: -
Enter a name for the project name and click
Create
. -
NOTE: By creating a project in Watson Studio a free tier
Object Storage
service andWatson Machine Learning
service will be created in your IBM Cloud account. Select theFree
storage type to avoid fees. -
Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the
Assets
andSettings
tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.
The data used in this example is part of the ICD-10 data set and a cleaned version we'll use is available in the repo under data/ICD-10-GT-AA.csv. We'll now train an NLC model using this data.
-
From the new project
Overview
panel, click+ Add to project
on the top right and choose theNatural Language Classifier
asset type. -
A new instance of the NLC tool will launch.
-
Add the data to your project by clicking the
Browse
button in the right-handUpload to project
section and browsing to the cloned repo. Choose thedata/ICD-10-GT-AA.csv
file. -
Select the
ICD-10-GT-AA.csv
file you just uploaded and chooseAdd to model
. -
Click the
Train model
button to begin training. The model will take around an hour to train. -
To check the status of the model, and access it after it trains, go to your project in the
Assets
tab of theModels
section. The model will show up when it is ready. Double click to see theOverview
tab. -
The first line of the
Overview
tab contains theModel ID
, remember this value as we'll need it in the next step.
Follow the steps below for deploying the application:
- Press the
Deploy to IBM Cloud
button below.
-
From the IBM Cloud deployment page click the
Deploy
button. -
From the Toolchains menu, click the Delivery Pipeline to watch while the app is deployed. Once deployed, the app can be viewed by clicking View app.
-
The app and service can be viewed in the IBM Cloud dashboard. The app will be named
nlc-icd10-classifier
, with a unique suffix. -
We now need to add a few environment variables to the application's runtime so the right classifier service and model are used. Click on the application from the dashboard to view its settings.
-
Once viewing the application, click the
Runtime
option on the menu and navigate to theEnvironment Variables
section. -
Update the
CLASSIFIER_ID
, andNATURAL_LANGUAGE_CLASSIFIER_APIKEY
variables with yourModel ID
from Step 4 and NLC API key from Step 2. ClickSave
. -
After saving the environment variables, the app will restart. After the app restarts you can access it by clicking the Visit App URL button.
The general recommendation for Python development is to use a virtual environment (venv). To install and initialize a virtual environment, use the venv
module on Python 3 (you install the virtualenv library for Python 2.7):
-
Create the virtual environment using Python. Use one of the two commands depending on your Python version.
Note: it may be named python3 on your system.
python -m venv mytestenv # Python 3.X virtualenv mytestenv # Python 2.X
-
Now source the virtual environment. Use one of the two commands depending on your OS.
source mytestenv/bin/activate # Mac or Linux ./mytestenv/Scripts/activate # Windows PowerShell
TIP 💡 To terminate the virtual environment use the
deactivate
command. -
Rename the
env.example
file to.env
mv env.example .env
-
Update the
.env
file with the NLC credentials for either username/password or API key# Replace the credentials here with your own using either USERNAME/PASSWORD or IAM_APIKEY # Comment out the unset environment variables # Rename this file to .env before running app.py. CLASSIFIER_ID=<add_nlc_classifier_id> NATURAL_LANGUAGE_CLASSIFIER_APIKEY=<add_nlc_apikey>
-
Install the app dependencies by running:
pip install -r requirements.txt
-
Start the app by running
python app.py
-
Open a browser and point to
localhost:5000
.
The user inputs information into the Text to classify: text box and the Watson NLC classifier will return ICD10 classifications with confidence scores.
- Watson NLC API
- Watson Python SDK
- Ryan Anderson's Original Work
- ICD-10 API
- ICD-10 on Wikipedia
- Intro to NLC Tutorial
- Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
- AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.