Comprehensively-Curated Dataset of CYP450 Interactions: Enhancing Predictive Models for Drug Metabolism
Here are source codes for demonstrating machine learning and deep learning using this dataset.
- DT.py is for Decision Tree.
- GCN_hyperopt_CV.py is for Graph Convolution Network (GCN)
- RF.py is for Random Forest.
- SVC.py is for Support Vector Machine.
- SVR.py is for Support Vector Regression.
Yu-Hao Ni1, Yu-Wen Su1, Shaung-Chen Yang2, Jia-Cheng Hong1, Tien-Chueh Kuo1,5, and Yufeng Jane Tseng1,3,4,5,*
1Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617,Taiwan
2School of Medicine, National Taiwan University, Taipei 10051, Taiwan
3Department of Computer Science and Information Engineering, National Taiwan University, Taipei 10617,Taiwan
4School of Pharmacy,College of Medicine, National Taiwan University, Taipei 10002, Taiwan
5The Metabolomics Core Laboratory, Centers of Genomic and Precision Medicine, National Taiwan University,Taipei 10617, Taiwan
*corresponding author(s): Yufeng Jane Tseng (yjtseng@csie.ntu.edu.tw)
We collected and organized a detailed dataset encompassing both substrates and non-substrates for six principal cytochrome P450 (CYP450) isozymes, responsible for 90% of Phase I drug metabolism in humans. These isozymes, specifically CYP1A2, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4, play critical roles in the detoxification and metabolic processing of therapeutic compounds. The dataset, meticulously assembled, includes interactions with approximately 2000 compounds per enzyme, ensuring comprehensive coverage and high accuracy. Employing a combination of conventional machine learning techniques alongside advanced methodologies such as Graph Convolutional Networks (GCN), robust models have been developed to elucidate these drug-enzyme interactions. The dataset is poised to significantly contribute to fields requiring pharmacokinetic modeling, furthering drug development efforts and toxicological studies by providing an essential resource for the accurate prediction of metabolic pathways, thereby enhancing drug safety and efficacy assessments.