Adopting the technique used in the paper: Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design for optimising the pIC50 value of the molecules
Outputs of some of the experiments are in the folder "past outputs"
To run the main program on the same data as used in the best outputs (in folder "past outputs/7July/clean_good_manual/"):
python Main.py
It is also possible to run the program on a custom set of lead molecules and/or fragments.
python Main.py fragment_molecules.smi lead_file.smi
Molecules that are generated during the process can be viewed by running:
python viewing_outputs.py -epoch epoch
where epoch
is the epoch that should be viewed.
New molecules can also be generated from a saved generation model. For this run:
python viewing_outputs.py -gen 1
In the above run actions would be sampled from discrete distribution output by the actor(=> stochastic output) For performing the max probability action:
python viewing_outputs.py -gen 1 -stoch 0
Also, remember to change the appropriate file_path in the code. Please note that I have NOT SAVED the best generation model but an equivalent model is present and could very well be used.
In either of the ways, the output is as follows:
- Displays two columns of molecules as PNG file. The first column contains the original lead molecule, while the second column contains modified molecules.
- Displays a histogram containing the pIC50 distributions in the lead molecules and the final output.
- Saves two csv files- one containing a table of all the changed molecules and one containing a table of all the molecules which have been made from inactive to active. These files are saved in the folder past outputs
Any global parameters can be changed by changing them in the file "Modules/global_parameters.py"
- Main.py: The main file. This has to be run for training.
- viewing_outputs.py: File to view outputs as described above.
- Show_Epoch.py: Reads and decodes generated molecules, used by viewing_outputs.py
- FMPO-Visualising the outputs.ipynb: Jupyter notebook used for testing parts of the code, as well as viewing outputs
- Files inside "Modules":
- build_encoding.py: Contains functions involved in building and saving encodings
- file_reader.py: Contains functions involved in reading .smi and .csv input files
- global_parameters.py: All global parameters can be set here
- models.py: Architecture of Actor and Critic are present
- mol_utils.py: Utility functions for handling molecules(like breaking fragments)
- rewards.py: The predictive model is deployed here. Contains all funcions pertaining to generating the rewards.
- similarity.py: Contains functions that can be used to calculate similarity coefficients- Tanimoto and Levenshtein/Edit Distance
- training.py: Calculates the initial distribution and trains the actor and critic networks.
- tree.py: Implements the tree class along with "btl": Build tree from list function
- Padel.txt: Contains the outputs of Padel file
- descriptors.csv: used to store the initial descriptors in this file
- uneval_desc.csv: in case, descriptors.csv contains NaN values, such rows are re-evaluated in uneval_desc.csv
The following Python libraries are required to run it:
- rdkit
- numpy
- sklearn
- keras
- pandas
- bisect
- Levenshtein
- A backend to keras, such as theano, tensorflow or CNTK
- xgboost