ET-Tox

Adapted from https://github.com/torchmd/torchmd-net

Installation

Clone the repository:

git clone https://github.com/torchmd/ET-Tox.git
cd ET-Tox

Install Mambaforge (https://github.com/conda-forge/miniforge/#mambaforge). It is recommended to use mamba rather than conda. conda is known to produce broken enviroments with PyTorch.

Create an environment and activate it:

mamba env create -f environment.yml
mamba activate et_tox

Install TorchMD-NET into the environment:
```
pip install -e .
```

Cite

Cremer J, Sandonas LM, Tkatchenko A, Clevert D-A, de Fabritiis G. Equivariant Graph Neural Networks for Toxicity Prediction. ChemRxiv. Cambridge: Cambridge Open Engage; 2023;  This content is a preprint and has not been peer-reviewed.

Data

The data must be downloaded here: https://zenodo.org/record/7942946, and unpacked into the respective data folder (files containing "tdc" into data/TDCTox, files containing "mnet" into data/MoleculeNet)

Usage

Specifying training arguments can either be done via a configuration yaml file or through command line arguments directly. Example yaml files can be found in "train_yaml" folder. GPUs can be selected by their index by listing the device IDs (coming from nvidia-smi) in the CUDA_VISIBLE_DEVICES environment variable. Otherwise, the argument --ngpus can be used to select the number of GPUs to train on (-1 uses all available GPUs or the ones specified in CUDA_VISIBLE_DEVICES). For training on random splits:

mkdir output_dir
CUDA_VISIBLE_DEVICES=0 python train.py --conf train_yaml/tox21.yaml --log-dir output_dir/

For training on scaffold splits:

mkdir output_dir
python train.py --conf ./train_yaml/"${name}".yaml --log-dir output_dir/ --seed "${seed}" --splits ./data/TDCTox/splits/"${name}"_split_1_scaffold.npz --use-energy-feature false --dataset-split scaffold

mkdir output_dir
python train.py --conf ./train_yaml/"${name}".yaml --log-dir output_dir/ --seed "${seed}" --splits ./data/MoleculeNet/splits/"${name}"_seed"${seed}"_confs1_scaffold.npz

For training with multiple conformers per molecule (specify num_confs) on a scaffold split on TDC:

mkdir output_dir
python train.py --conf ./train_yaml/"${name}".yaml --log-dir output_dir/ --seed "${seed}" --splits ./data/TDCTox/splits/"${name}"_seed"${seed}"_confs"${num_confs}"_scaffold.npz --dataset-split scaffold

mkdir output_dir
python train.py --conf ./train_yaml/"${name}".yaml --log-dir output_dir"${name}" --seed "${seed}" --splits ./data/MoleculeNet/splits/"${name}"_seed"${seed}"_confs$"{num_confs}"_scaffold.npz

Pretrained models

Pretrained models are available at https://zenodo.org/record/7942946

Usage

Unpack models into ./models and either test the models in a jupyter notebook environment by using test_models.ipynb or give the --test-run flag and specify --test-checkpoint model_path

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
torchmdnet		torchmdnet
train_yaml		train_yaml
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py
test_models.ipynb		test_models.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ET-Tox

Installation

Cite

Data

Usage

Pretrained models

Usage

About

Releases

Packages

Languages

License

aidd-msca/ET-Tox

Folders and files

Latest commit

History

Repository files navigation

ET-Tox

Installation

Cite

Data

Usage

Pretrained models

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages