We present a tool to address the challenge of retrosynthesis, using an evolutionary algorithm to search multi-step retrosynthetic route. By incorporating a multi-branch encoding strategy and a general genetic operator, our approach significantly reduces search time while generating accurate and feasible routes, outperforming existing methods like Monte Carlo tree search.
The following code is executed in Linux system.
git clone https://github.com/ilog-ecnu/AlphaRetro
cd AlphaRetro
conda env create -f env_single.yml
conda activate alpha_retro
conda env create -f env_multi.yml
conda activate single_step
USPTO_50K: Google Drive USPTO_50K
Pistachio: Nextmove Pistachio
Building block dataset: Enamine Building Block
All pre-trained model can be download from Google Drive model
Prepare the dataset in the format under data/t5_data/50k_example.csv
, and then pass the path to the main function to start the training.
cd single_step/t5
conda activate single_step
python train.py
Input/Output Format:
Input: A SMILES string (e.g., NC(=O)c1cn(Cc2c(F)cccc2F)nn1)
Output:
- top_k_precursors: List of top-$k$ predicted precursor SMILES([N-]=[N+]=NCc1c(F)cccc1F. The model returns a list of top-$k$ precursor candidates (e.g., [Fc1cccc(F)c1CBr.[N-]=[N+]=[N-],CS(=O)(=O)OCc1c(F)cccc1F.[N-]=[N+]=[N-], ..., Fc1cccc(F)c1CBr.[N-]=[N+]=N])
- scores: List of associated confidence scores for each prediction (e.g., [0.18, 0.12, ..., 0.08])
The single-step model and reaction-type model, after being trained or downloaded, are mounted via the server's serve.py
file and then accessed through the client.
Then the client can be used to search for the retrosynthesis of the given molecule:
conda activate alpha_retro
python -u multi-step.py > multi-step.log 2>&1
If you find this repository helpful, please give it a star.