This is the Python code used in my Bachelor's Thesis "Interpreting a Convolutional Text Classification Neural Network on a Clinical Dataset". The main Jupyter notebook file is based on Ben Trevett's PyTorch sentiment analysis tutorial. The base code of the interpretation methods was given by my supervisor. The DementiaBank dataset is used in this analysis which cannot be shared publically.
In this Bachelor’s Thesis, a convolutional text classification neural network is interpreted to find out why the neural network makes such predictions. To perform the analysis, the clinical DementiaBank dataset was used in which people with Alzheimer’s disease describe the Boston cookie theft image. The task of the binary classification was to identify based on the given text whether a person has Alzheimer’s or not. Interpretation methods described in Jacovi et al. (2018) were implemented. In addition to that, concrete examples of texts are interpreted in this thesis. Out of all the analyses performed, informative and uninformative ngrams and slot activation vectors with their clustering yield good results. Negative ngrams analysis results were substandard because of the specificity of the dataset.
Open Anaconda prompt as an administrator from this project's root directory. Run the following commands:
conda create --name interpreting-db --file env.txt
(64-bit Windows environment file)
activate interpreting-db
python -m spacy download en
jupyter notebook
from Jupyter notebook's user interface, you can just click on interpreting-db.ipynb
file to open it.