Predicting rare drug-drug interaction events with dual-granular structure-adaptive and pair variational representation
This is a meta-learning-based DDIEs predictor. Before the article is published, this project only contains all data, and the reproduction code is shown in the submission document.
Installation Tested on Ubuntu 16.04, CentOS 7, windos 10 with Python 3.7 on one NVIDIA RTX 4080Ti GPU.
After downloading the code and data, execute the following command to install all dependencies. This may take some time.
pip install -r requirements.txt
The repositories for Independent, RareDDIE and ZetaDDIE provide code for reproducing our results.
- Run
tester_struc_drugbank.py
andtester_struc_mdf.py
to reproduce the reported results.
- Run
trainer_structure_acc_fp_neigh_VAE_GAN_struc.py
to train the model on the standard dataset1. - Run
trainer_structure_acc_fp_neigh(dataset2)_VAE_GAN_struc.py
to train the model on the standard dataset2.
Users can preprocess their own datasets for use with our models. A step-by-step example is provided in the toy example directory.
Users should first prepare their dataset, including interaction event data, drug data, and SMILES representations of drugs. The expected formats follow those in toy.data
, druglist.csv
, and drug_smiles.csv
.
-
Generate interaction event input files
-
Run
1construct_task.py
to generate event input files:
train_tasks.json
: Common events for trainingdev_tasks.json
: Common events for validationtest_tasks.json
: Fewer events for testingtest2_tasks.json
: Rare events for testing
-
-
Generate DDIE relationship input files
-
Run
2data_(get_e1rel_e2_and_rel2candidates).py
to produce:
e1rel_e2.json
: Drug-drug interaction event relationshipsrel2candidates.json
: Relationship of candidates
-
-
Integrate the background graph
- Replace the default
path_graph
with a prebuilt background graph. - Add
dti_entity.csv
anddti_rel.csv
to define entities and relationships in the graph. - Run
3add_entity_and_rel.py
to incorporate these into the dataset.
- Replace the default
-
Generate drug feature representations
- Copy the prepared SMILES file to
fp/data/
. - Run
save_features.py
to generate the feature filemorgan_toy_dataset.npz
in thefeatures
directory.
- Copy the prepared SMILES file to
A training example for RareDDIE is provided in the toy example directory.
Run
python trainer_structure_acc_fp_neigh_VAE_struc.py --dataset toy_dataset --few 10 --train_few 10 --batch_size 256
- If users have pre-trained background graph embeddings (e.g.,
DRKG_TransE_entity.npy
andDRKG_TransE_relation.npy
), they should constructent2embids
andrelation2embids
files to map all dataset and background entities/relations to feature indices. - For entities or relations without pretrained features, set the corresponding index to
-1
.
Run
python trainer_structure_acc_fp_neigh_VAE_struc.py --dataset toy_dataset --few 10 --train_few 10 --batch_size 256 --random_embed False
To run ZetaDDIE, simply replace the preprocessed dataset directory with the appropriate data.
A test example for RareDDIE is provided in the toy example directory.
Run
python tester_struc_dataset.py
Users can also leverage a model trained on their own standard dataset to directly predict on an independent dataset, enabling cross-domain prediction. We provide code for reproducing our results and user own dataset.
- Prepare the independent dataset following the same preprocessing steps described earlier. Copy the processed dataset into the
Independent
directory (e.g.,twoside
). - Copy the trained standard dataset and trained model into the
Independent
directory (e.g.,dataset1
andmodels
). - Execute the prediction script to evaluate the model’s cross-domain performance.
Run
python tester_cross_domain.py
All datasets are processed from these works[1-5] and databases[6-7].
- Lin S, et al. MDF-SA-DDI: predicting drug–drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. Brief Bioinform 23, bbab421 (2022).
- Nyamabo AK, Yu H, Liu Z, Shi J-Y. Drug–drug interaction prediction with learnable size-adaptive molecular substructures. Brief Bioinform 23, bbab441 (2022).
- Preuer K, Lewis RP, Hochreiter S, Bender A, Bulusu KC, Klambauer G. DeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 34, 1538-1546 (2018).
- Nair NU, et al. A landscape of response to drug combinations in non-small cell lung cancer. Nature Communications 14, 3830 (2023).
- Ma T, Lin X, Song B, Philip SY, Zeng X. Kg-mtl: knowledge graph enhanced multi-task learning for molecular interaction. IEEE Transactions on Knowledge and Data Engineering. 35, 7068-7081 (2022).
- Wishart DS, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. NAR 46, D1074-D1082 (2018).
- Tatonetti NP, Ye PP, Daneshjou R, Altman RB. Data-driven prediction of drug effects and interactions. Sci Transl Med 4, 125ra131-125ra131 (2012).