This package was designed as a tool to quickly compute thousands of sets of atomic or residue molecular contacts. The contacts can be evaluated inside a single body or across two bodies. The library scales well, with the support of the native python multithreading. The module also provides docking poses evaluation by the application of triplets of Euler angles and translation vectors to initial unbound conformations.
Should be as simple as pip intstall ccmap
. Alternatively you can clone this repo and run python setup.py install
at the root folder.
Current release was successfully installed through pip on the following combinations of interpreter/platforms.
- python3.8/OSX.10.14.6
- python3.8/Ubuntu LTS
From there you can load the package and display its help.
import ccmap
help(ccmap)
Four functions are available:
- cmap: computes the contacts of one single/two body molecule
- lcmap: computes the contacts of a list of single/two body molecules
- zmap: computes the contacts between a receptor and a ligand molecule after applying transformations to the ligand coordinates
- lzmap: computes many sets of contacts between a receptor and a ligand molecule, one for each applied ligand transformation
All module functions take molecular object coordinates as dictionaries, where keys are atoms descriptors and values are lists.
- 'x' : list of float x coordinates
- 'y' : list of float x coordinates
- 'y' : list of float x coordinates
- 'seqRes' : list of strings
- 'chainID' : list of one-letter string
- 'resName' : list of strings
- 'name' : list of strings
In Angstrom's unit, its default value is 4.5. It can be redefined by the name parameter d
.
If True, contacts are returned as integers. Each integer encoding one pair of atoms/residues positions in contact with this simple formula,
def K2IJ(k, sizeBody1, sizeBody2):
nCol = sizeBody2 if sizeBody2 else sizeBody1
return int(k/nCol), k%nCol
if False, contacts are returned as strings of JSON Objects
If True, compute contact at the atomic level. By default, this if False and the contacts are computed at the residue level.
If True, the past dictionaries of coordinates will be modified according to Euler/translation parameters. This is useful to generate single docking conformation. This argument is only available for the cmap function.
When working with protein docking data, unbound conformations are often centered to the origin of the coordinates system. Specify the translation vectors for each body with the offsetRec
and offsetLig
named arguments. Only available for the zmap and lzmap functions.
We usually work with molecules in the PDB format. We can use the pyproteinsExt package to handle the boilerplate.
import pyproteinsExt
parser = PDB.Parser()
pdbREC = parser.load(file="dummy_A.pdb")
pdbDictREC = pdbREC.atomDictorize
pdbDictREC.keys()
#dict_keys(['x', 'y', 'z', 'seqRes', 'chainID', 'resName', 'name']) ```
By convention, following examples will use two molecules names REC(eptor) and LIG(and).
pdbLIG = parser.load(file="dummy_B.pdb")
pdbDictLIG = pdbLIG.atomDictorize
pdbDictLIG.keys()
#dict_keys(['x', 'y', 'z', 'seqRes', 'chainID', 'resName', 'name']) ```
Setting contact distance of 6.0 and recovering residue-residue contact as an integer list.
ccmap.cmap(pdbDictLIG, d=6.0, encode=True)
Using default contact distance and recovering atomic contact maps as JSON object string. The first positional argument specifies a list of bodies to process independently.
import json
json.load( ccmap.lcmap([ pdbDictLIG, pdbDictREC ], atomic=True) )
The second positional argument of cmap is optional and defines the second body.
ccmap.cmap(pdbDictLIG, pdbDictLIG, d=6.0, encode=True)
The second positional argument of lcmap is an optional list of second bodies. The first two arguments must be of the same size, as the i-element of the first will be processed with the i-element of the second.
ccmap.lcmap([pdbDictREC_1, ..., pdbDictREC_n], [pdbDictLIG_1, pdbDictLIG_n], d=6.0, encode=True)
Use the zmap function with third and fourth positional arguments respectively specifying the :
- Euler angles triplet
- translation vector
ccmap.zmap(pdbDictREC, pdbDictLIG , (e1, e2, e3), (t1, t2, t3) )
Transformations are always applied to the coordinates provided as a second argument, e.g. : pdbDictLIG
.
Use the lzmap function, arguments are similar but for the Euler angles and translation vectors which must be supplied as lists.
ccmap.lzmap(pdbDictREC, pdbDictLIG , [(e1, e2, e3),], [(t1, t2, t3),] )
The conformations obtained by coordinate transformation can be back mapped to PDB files.
Here, offset vectors [u1, u2, u3]
and [v1, v2, v3]
respectively center pdbDictREC
and pdbDictLIG
and one transformation defined by the [e1, e2, e3]
Euler's angles and the [t1, t2, t3]
translation vector is applied to pdbDictLIG
. The resulting two-body conformation is finally applied to the provided pdbDictREC
and pdbDictLIG
. These updated coordinates update the original PDB object for later writing to file.
# Perform computation & alter provided dictionaries
ccmap.zmap( pdbDictREC, pdbDictLIG,
\ [e1, e2, e3], [t1, t2, t3],
\ offsetRec=[u1, u2, u3],
\ offsetLig=[v1, v2, v3],
\ apply=True)
# Update PDB containers from previous examples
pdbREC.setCoordinateFromDictorize(pdbDictREC)
pdbLIG.setCoordinateFromDictorize(pdbDictLIG)
# Dump to coordinate files
with open("new_receptor.pdb", "w") as fp:
fp.write( str(pdbREC) )
with open("new_ligand.pdb", "w") as fp:
fp.write( str(pdbLIG) )
The C implementation makes it possible for the ccmap functions to release Python Global Interpreter Lock. Hence, "actual" multithreading can be achieved and performances scale decently with the number of workers. For this benchmark, up to 50000 docking poses were generated and processed for three coordinate sets of increasing number of atoms: 1974(1GL1) 3424(1F34) 10677(2VIS).
A simple example of a multithread implementation can be found in the provided script. The tests
folder allows for the reproduction of the above benchmark.
C executable can be generated with the provided makefile. The low-level functions are the same, but the following limitations exist:
- One computation per executable call
- No multithreading.