Package to easily evaluate models on the supplied hERG blocker dataset.
All models were developed on a Linux plattform. While Windows is supported, you need to enable filenames longer than 247 characters for the training experiments to be able to run. Unfortunately ThunderSVM lacks precompiled Windows binaries. To run these experiments on a Windows computer, the easiest approach is to use Windows Subsystems for Linux 2 with support for GPU (see https://learn.microsoft.com/en-us/windows/wsl/setup/environment).
Anaconda is used to manage python dependencies, find installation packages here: https://www.anaconda.com/products/distribution. This software was developed with Anaconda 4.10.1, but any later version should work as well.
All models except thundersvm can be trained with the same environment. To train models with thundersvm (which depends on cuda 9.0), use the thundersvm_environment.yml
. For the other environments use torch_environment.yml
.
Create the different envirnoments by running:
conda env create -f environment.yml # For all models except thundersvm
conda env create -f thundersvm_environment.yml # For thundersvm models
This creates environments called herg-base
and herg-thundersvm
. Now we have to install this package in both of these environments:
conda activate herg-base
pip install -e .
conda activate herg-thundersvm
pip install -e .
This creates a symlinked version of the python package in this repository in the respective environment.
Models are traiend using experiment configs using the script scripts/run_experiment.py
. All training options are determined by a python module
which acts as a configuration file, example versions are located in config/experiment_configs
scripts/run_experiment.py config/experiment_config/[CONFIG_FILE]
By default experiment results are saved to a new experiments
subdirectory.
For example, to train the thundersvm models on the hERG data, first enable the correct conda environment:
conda activate herg-thundersvm
Then run the experiments by supplying the experiment configuration file:
python scripts/run_experiments.py configs/experiment_configs/herg_ogura_experiment_thundersvm.py
The resulting experiments are collected in directories. Each model directory will have a number suffix from 00 to 19. You can find the summarized performance in a timestamped subdirectory following the pattern experiments/herg_ogura/thundersvm_00/[TIMESTAMP]/resamples/resample_00/evaluation
.
To run evaluations on a separate dataset, use the script scripts/evaluate_on_dataset.py
. The script can be run like so:
python scripts/evaluate_on_dataset.py dataset/herg_karim_et_al/dataset_spec.py experiments/
This will locate all trained models in the directory experiments
and evaluate them on the dataset specified by the supplied dataset_spec.py
file. This file tells the framework how it should parse the CSV.
Performance of experiments can be summarized with the scripts/summarize_performance.py
script. Run it like so:
python scripts/summarize_performance.py experiments/
This will create files in the current directory named like performance_herg_ogura_test_filtered.csv
for all models the script could find which had evaluation runs on that dataset. If scripts/evaluate_on_dataset.py
has been run, any resulting evaluation data on another will be automaticall picked up and summarized in a separate file for that dataset.