Utility specific information regarding equipment and maintenance programs has been redacted, rendering this view-only.
The use case is classification of electric plant (utility) equipment failure mitigation strategies into general categories to identify trends and deficiencies.
This model classifies recods in the free-text 'Mitigation' field of a plant equipment database (71,000 records) into based on 'Mitigation' strategy type. Records are binned into one or multiple of the following classes:
- maintenance
- operational
- physical_barrier
- design_and_engineering
- supply_chain
- unknown
The NLP classifier applies a Linear Support Vector Classification (SVC) algorithm (https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html).
LVSC was selected based on trials using a number of text-classification algorithms, including:
- Random Forrest
- Naive Bayes
- Linear Regression
Moreover, this classifier applies a multi-label, "One vs Rest" (also known as 'binomial classifiication') strategy, which iteratively applies a seperate LVSM classifier for each label.
- Clone the repository
git clone https://github.com/ANMillerIII/LVSM.git
- Initialize and activate vitual environment
py -m venv venv
./venv/Scripts/activate
- Install dependencies
py -m pip install requirements.txt -r
- Switch to "LVSM" directory
cd LVSM
To run the 'Mitigation' classifier
py ./1_mitigation/mitigation_model.py
Prediction output will be in the respective 'out' directories
- 'train_set' data set is used based on naive string matching with some oversight. This should be improved by manually classifying a greater sample of 'Mitigation' fields by hand.
- Accuracy is not a meaningful metric for the 'apply_set' data, since the model is used to make predictions (i.e., must spot check manually)
- Foreign-language entries are classified with low fidelity due to lack of manual classification.
- 'train_set' data (3,569 records) are not classified by NLP, but rather string-matching/manually with inherent biases.