Skip to content

Latest commit

 

History

History
319 lines (272 loc) · 7 KB

README.md

File metadata and controls

319 lines (272 loc) · 7 KB

TAGMol: Target-Aware Gradient-guided Molecule Generation

[Paper]

TAGMol Framework


Environment Setup

The code has been tested in the following environment:

conda create -n tagmol python=3.8.17
conda activate tagmol
conda install pytorch=1.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pyg=2.2.0 -c pyg
conda install rdkit=2022.03.2 openbabel=3.1.1 tensorboard=2.13.0 pyyaml=6.0 easydict=1.9 python-lmdb=1.4.1 -c conda-forge

# For Vina Docking
pip install meeko==0.1.dev3 scipy pdb2pqr vina==1.2.2 
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3

IMPORTANT NOTE: You might have to do the following to append the path of the root working directory.

export PYTHONPATH=".":$PYTHONPATH

Data and Checkpoints

The resources can be found here. The data are inside data directory, the backbone model is inside pretrained_models and the guide checkpoints are inside logs.

Training

Training Diffusion model from scratch

python scripts/train_diffusion.py configs/training.yml

Training Guide model from scratch

BA

python scripts/train_dock_guide.py configs/training_dock_guide.yml

QED

python scripts/train_dock_guide.py configs/training_dock_guide_qed.yml

SA

python scripts/train_dock_guide.py configs/training_dock_guide_sa.yml

NOTE: The outputs are saved in logs/ by default.


Sampling

Sampling for pockets in the testset

BackBone

python scripts/sample_diffusion.py configs/sampling.yml --data_id {i} # Replace {i} with the index of the data. i should be between 0 and 99 for the testset.

We have a bash file that can run the inference for the entire test set in a loop.

bash scripts/batch_sample_diffusion.sh configs/sampling.yml backbone

The output will be stored in experiments/backbone. The following variables: BATCH_SIZE, NODE_ALL, NODE_THIS and START_IDX, can be modified in the script file, if required.

BackBone + Gradient Guidance

python scripts/sample_multi_guided_diffusion.py [path-to-config.yml] --data_id {i} # Replace {i} with the index of the data. i should be between 0 and 99 for the testset.

To run inference on all 100 targets in the test set:

bash scripts/batch_sample_multi_guided_diffusion.sh [path-to-config.yml] [output-dir-name]

The outputs are stored in experiments_multi/[output-dir-name]when run using the bash file. The config files are available in configs/noise_guide_multi.

  • Single-objective guidance
    • BA: sampling_guided_ba_1.yml
    • QED: sampling_guided_qed_1.yml
    • SA: sampling_guided_sa_1.yml
  • Dual-objective guidance
    • QED + BA: sampling_guided_qed_0.5_ba_0.5.yml
    • SA + BA: sampling_guided_sa_0.5_ba_0.5.yml
    • QED + SA: sampling_guided_qed_0.5_sa_0.5.yml
  • Multi-objective guidance (our main model)
    • QED + SA + BA: sampling_guided_qed_0.33_sa_0.33_ba_0.34.yml

For example, to run the multi-objective setting (i.e., our model):

bash scripts/batch_sample_multi_guided_diffusion.sh configs/noise_guide_multi/sampling_guided_qed_0.33_sa_0.33_ba_0.34.yml qed_0.33_sa_0.33_ba_0.34

Evaluation

Evaluating Guide models

python scripts/eval_dock_guide.py --ckpt_path [path-to-checkpoint.pt]

Evaluation from sampling results

python scripts/evaluate_diffusion.py {OUTPUT_DIR} --docking_mode vina_score --protein_root data/test_set

The docking mode can be chosen from {qvina, vina_score, vina_dock, none}

NOTE: It will take some time to prepare pqdqt and pqr files when you run the evaluation code with vina_score/vina_dock docking mode for the first time.


Results

Methods Vina Score (↓) Vina Min (↓) Vina Dock (↓) High Affinity (↑) QED (↑) SA (↑) Diversity (↑) Hit Rate % (↑)
Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med.
Reference -6.36 -6.46 -6.71 -6.49 -7.45 -7.26 - - 0.48 0.47 0.73 0.74 - - 21
liGAN - - - - -6.33 -6.20 21.1% 11.1% 0.39 0.39 0.59 0.57 0.66 0.67 13.2
AR -5.75 -5.64 -6.18 -5.88 -6.75 -6.62 37.9% 31.0% 0.51 0.50 0.63 0.63 0.70 0.70 12.9
Pocket2Mol -5.14 -4.70 -6.42 -5.82 -7.15 -6.79 48.4% 51.0% 0.56 0.57 0.74 0.75 0.69 0.71 24.3
TargetDiff -5.47 -6.30 -6.64 -6.83 -7.80 -7.91 58.1% 59.1% 0.48 0.48 0.58 0.58 0.72 0.71 20.5
DecompDiff -4.85 -6.03 -6.76 -7.09 -8.48 -8.50 64.8% 78.6% 0.44 0.41 0.59 0.59 0.63 0.62 24.9
TAGMol -7.02 -7.77 -7.95 -8.07 -8.59 -8.69 69.8% 76.4% 0.55 0.56 0.56 0.56 0.69 0.70 27.7

Due to space constraints, we only share the eval_results folder generated from the evaluation script. It can be found in the same link as other resources, inside results directory.


Citation

@article{dorna2024tagmol,
  title={TAGMol: Target-Aware Gradient-guided Molecule Generation},
  author={Vineeth Dorna and D. Subhalingam and Keshav Kolluru and Shreshth Tuli and Mrityunjay Singh and Saurabh Singal and N. M. Anoop Krishnan and Sayan Ranu},
  journal={arXiv preprint arXiv:2406.01650},
  year={2024}
}

Acknowledgements

This codebase was build on top of TargetDiff