CReM - chemically reasonable mutations

CReM is an open-source Python framework to generate chemical structures using a fragment-based approach.

The main idea behind is similar to matched molecular pairs considering context that fragments in the identical context are interchangeable. Therefore, one can create a database of interchangeable fragments and use it for generation of chemically valid structures.

Features:

Generation of a custom fragment database
Three modes of structure generation: MUTATE, GROW, LINK
Context radius to consider for replacement
Fragment size to replace and the size of a replacing fragment
Protection of atoms from modification (e.g. scaffold protection)
Replacements with fragments occurred in a fragment database with certain minimal frequency
Make randomly chosen replacements up to the specified number

Installation

Several command line utilities will be installed to create fragment databases and crem module will become available in Python imports to generate structures.

From pypi package

pip install crem

Manually from repository

git clone https://github.com/DrrDom/crem
cd crem
python3 setup.py sdist bdist_wheel
pip install dist/crem-0.1-py3-none-any.whl

Uninstall

pip uninstall crem

Dependencies

crem requires rdkit>=2017.09. To run the guacamol test guacamol should be installed.

Generation of a fragment database

Fragmentation of input structures:

fragmentation -i input.smi -o frags.txt -c 32 -v

Convert fragments to standardized representation of a core and a context of a given radius:

frag_to_env -i frags.txt -o r3.txt -r 3 -c 32 -v

Remove duplicated lines in the output file and count frequency of occurrence of fragemnt-context pairs. These (sort and uniq) are bash utilities but since Win10 is Linux-friendly that should not be a big issue for Win users to execute them

sort r3.txt | uniq -c > r3_c.txt

Create DB and import the file to a database table

env_to_db -i r3_c.txt -o fragments.db -r 3 -c -v

Last three steps should be executed for each radius. All tables can be stored in the same database.

Structure generation

Import necessary functions from the main module

from crem.crem import mutate_mol, grow_mol, link_mols
from rdkit import Chem

Create a molecute and mutate it. Only one heavy atom will be substituted. Default radius is 3.

m = Chem.MolFromSmiles('c1cc(OC)ccc1C')  # toluene
mols = list(mutate_mol(m, db_name='replacements.db', max_size=1))

output example

['CCc1ccc(C)cc1',
 'CC#Cc1ccc(C)cc1',
 'C=C(C)c1ccc(C)cc1',
 'CCCc1ccc(C)cc1',
 'CC=Cc1ccc(C)cc1',
 'CCCCc1ccc(C)cc1',
 'CCCOc1ccc(C)cc1',
 'CNCCc1ccc(C)cc1',
 'COCCc1ccc(C)cc1',
 ...
 'Cc1ccc(C(C)(C)C)cc1']

Add hydrogens to the molecule to mutate hydrogens as well

mols = list(mutate_mol(Chem.AddHs(m), db_name='replacements.db', max_size=1))

output

['CCc1ccc(C)cc1',
 'CC#Cc1ccc(C)cc1',
 'C=C(C)c1ccc(C)cc1',
 'CCCc1ccc(C)cc1',
 'Cc1ccc(C(C)C)cc1',
 'CC=Cc1ccc(C)cc1',
 ...
 'COc1ccc(C)cc1C',
 'C=Cc1cc(C)ccc1OC',
 'COc1ccc(C)cc1Cl',
 'COc1ccc(C)cc1CCl']

Grow molecule. Only hydrogens will be replaced. Hydrogens should not be added explicitly.

mols = list(grow_mol(m, db_name='replacements_sc2.db'))

output

['COc1ccc(C)c(Br)c1',
 'COc1ccc(C)c(C)c1',
 'COc1ccc(C)c(Cl)c1',
 'COc1ccc(C)c(OC)c1',
 'COc1ccc(C)c(N)c1',
 ...
 'COc1ccc(CCN)cc1']

Create the second molecule and link it to toluene

m2 = Chem.MolFromSmiles('NCC(=O)O')  # glycine
mols = list(link_mols(m, m2, db_name='replacements.db'))

output

['Cc1ccc(OCC(=O)NCC(=O)O)cc1',
 'Cc1ccc(OCCOC(=O)CN)cc1',
 'COc1ccc(CC(=N)NCC(=O)O)cc1',
 'COc1ccc(CC(=O)NCC(=O)O)cc1',
 'COc1ccc(CC(=S)NCC(=O)O)cc1',
 'COc1ccc(CCOC(=O)CN)cc1']

You can vary the size of a linker and specify the distance between two attachment points in a linking fragment. There are many other arguments available in these functions, look at their docstrings for details.

Multiprocessing

All functions have an argument ncores and can make mupltile replacement in one molecule in parallel. If you want to process several molecules in parallel you have to write your own code. However, the described functions are generators and cannot be used with multiprocessing module. Therefore, three complementary functions mutate_mol2, grow_mol2 and link_mols2 were created. They return the list with results and can be pickled and used with multiprocessing.Pool or other tools.

Example:

from multiprocessing import Pool
from functools import partial
from crem.crem import mutate_mol2
from rdkit import Chem

p = Pool(2)
input_smi = ['c1ccccc1N', 'NCC(=O)OC', 'NCCCO']
input_mols = [Chem.MolFromSmiles(s) for s in input_smi]

res = list(p.imap(partial(mutate_mol2, db_name='replacements.db', max_size=1), input_mols))

res would be a list of lists with SMILES of generated molecules

Precompiled fragment databases

The links to download precompiled fragment databases will be published at - http://www.qsar4u.com/pages/crem.php

Bechmarks

Guacamol

task	SMILES LSTM*	SMILES GA*	Graph GA*	Graph MCTS*	CReM
Celecoxib rediscovery	1.000	0.732	1.000	0.355	1.000
Troglitazone rediscovery	1.000	0.515	1.000	0.311	1.000
Thiothixene rediscovery	1.000	0.598	1.000	0.311	1.000
Aripiprazole similarity	1.000	0.834	1.000	0.380	1.000
Albuterol similarity	1.000	0.907	1.000	0.749	1.000
Mestranol similarity	1.000	0.79	1.000	0.402	1.000
C11H24	0.993	0.829	0.971	0.410	0.966
C9H10N2O2PF2Cl	0.879	0.889	0.982	0.631	0.940
Median molecules 1	0.438	0.334	0.406	0.225	0.371
Median molecules 2	0.422	0.38	0.432	0.170	0.434
Osimertinib MPO	0.907	0.886	0.953	0.784	0.995
Fexofenadine MPO	0.959	0.931	0.998	0.695	1.000
Ranolazine MPO	0.855	0.881	0.92	0.616	0.969
Perindopril MPO	0.808	0.661	0.792	0.385	0.815
Amlodipine MPO	0.894	0.722	0.894	0.533	0.902
Sitagliptin MPO	0.545	0.689	0.891	0.458	0.763
Zaleplon MPO	0.669	0.413	0.754	0.488	0.770
Valsartan SMARTS	0.978	0.552	0.990	0.04	0.994
Deco Hop	0.996	0.970	1.000	0.590	1.000
Scaffold Hop	0.998	0.885	1.000	0.478	1.000
total score	17.341	14.398	17.983	9.011	17.919

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
crem		crem
docs		docs
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
conda_env.yml		conda_env.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CReM - chemically reasonable mutations

Installation

Dependencies

Generation of a fragment database

Structure generation

Multiprocessing

Precompiled fragment databases

Bechmarks

Guacamol

Documentation

License

Citation

About

Releases

Packages

Languages

License

leelasdSI/crem

Folders and files

Latest commit

History

Repository files navigation

CReM - chemically reasonable mutations

Installation

Dependencies

Generation of a fragment database

Structure generation

Multiprocessing

Precompiled fragment databases

Bechmarks

Guacamol

Documentation

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages