CheMixNet is a general-purpose neural network architecture that combines two neural architectures (illustrated below). The first neural architecture is a fully connected multi-layer perceptron network trained on molecular fingerprints. The second neural architecture is trained on SMILES sequences and have 3 choices for their neural architecture:
- CNN
- RNN (LSTM or GRU)
- CNN-RNN (CNN followed by LSTM or GRU)
The proposed architecture performs better than other state of the art architectures such as SMILES2vec, Chemception, ConvGraph etc.
Requirements:
- Keras 2.0 or higher
- Tensorflow 1.7
- RDKit 2017.09.1
- Scikit-Learn 0.19.1
- Numpy 1.14
- Pandas 0.22
- chemixnet_util.py : Core module containing CheMixNet architecture
- run_eval.py : Module for evaluating CheMixNet across datasets
- dl_util.py : Utility file for running basic deep learning modules
- rdkit_util.py : Utility file for RDKIT functions (RDKIT is a molecular informatics python library)
- ml_util.py : Utility file for running basic machine learning modules
- eval_esol.ipynb : Notebook for evaluating the results of CheMixNet on the ESOL dataset
- eval_hiv.ipynb : Notebook for evaluating the results of CheMixNet on the HIV dataset
- eval_freesolv.ipynb : Notebook for evaluating the results of CheMixNet on the FreeSolv datasets (Experimental and Calculated)
- eval_tox.ipynb : Notebook for evaluating the results of CheMixNet on the Tox21 dataset
- fp_fc.py : Module for fully connected network developed on fingerprints
- smiles_nn.py: Module for CNN or RNN network on SMILES
The code was developed by the CUCIS group at the Electrical and Computer Engineering Department at Northwestern University.
- Arindam Paul (arindam.paul@eecs.northwestern.edu)
- Dipendra Jha (dipendra@u.northwestern.edu)
- Reda Al-Bahrani (reda@u.northwestern.edu)
- Ankit Agrawal (ankitag@eecs.northwestern.edu)
- Wei-keng Liao (wkliao@eecs.northwestern.edu)
- Alok Choudhary (choudhar@eecs.northwestern.edu)
If you use this code or data, please cite:
Arindam Paul, Dipendra Jha, Reda Al-Bahrani, Wei-keng Liao, Alok Choudhary, Ankit Agrawal. CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties using Multiple Molecular Representations. Thirty-second Conference on Neural Information Processing Systems (NIPS), 2018, DOI: https://arxiv.org/abs/1811.08283
email: arindam.paul@eecs.northwestern.edu or ankitag@eecs.northwestern.edu
Copyright (C) 2019, Northwestern University.
See COPYRIGHT notice in top-level directory.
This work was performed under the following financial assistance awards 70NANB14H012 and 70NANB19H005 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design (CHiMaD). Partial support is also acknowledged from DOE awards DE-SC0014330, DE-SC0019358.