Skip to content

Measures of roughness for molecular property landscapes

License

Notifications You must be signed in to change notification settings

coleygroup/rogi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Measures of roughness for molecular property landscapes

This package implements the roughness index (ROGI) presented in "Roughness of Molecular Property Landscapes and Its Impact on Modellability", as well as the SARI, MODI, and RMODI indices.

Installation

rogi can be installed with pip:

pip install rogi

Note that rdkit is a dependency but needs to be installed separately with conda.

Requirements

  • numpy
  • scipy>=1.4
  • fastcluster
  • pandas
  • scikit-learn>=1
  • rdkit >= 2021 to be installed with conda

Usage

Note that ROGI and SARI are classes, while MODI and RMODI are functions.

ROGI

If SMILES are used as input, Morgan fingerprints (length 2048, radius 2) are computed and a distance matrix calculated with the Tanimoto metric:

from rogi import RoughnessIndex

ri = RoughnessIndex(Y=Y, smiles=smiles)
ri.compute_index()
>>> 0.42

With precomputed fingerprints:

ri = RoughnessIndex(Y=Y, fps=fingerprints)
ri.compute_index()

With descriptors you can pass a 2D array or a pandas.DataFrame where each row is a different molecule, and each column a different descriptor:

ri = RoughnessIndex(Y=Y, X=descriptors, metric='euclidean')
ri.compute_index()

You can also precompute a distance matrix using any chosen representation and metric:

ri = RoughnessIndex(Y=Y, X=descriptors, metric='precomputed')
ri.compute_index()

SARI

You can provide SMILES as input, and compute the SARI score without considering a reference set of datasets as follows:

from rogi import SARI

sari = SARI(pKi=pKi, smiles=smiles, fingerprints='maccs')
sari.compute_sari()
>>> 0.42

To standardize the raw continuous and discontinuous scores based on a reference set of datasets, you can compute the raw scores first and then provide SARI with their average and standard deviation:

raw_conts = []
raw_discs = []

for smiles, pKi in zip(datasets, affinities):
    sari = SARI(pKi=pKi, smiles=smiles, fingerprints='maccs')
    raw_cont, raw_disc = sari.compute_raw_scores()
    raw_conts.append(raw_cont)
    raw_discs.append(raw_disc)

mean_raw_cont = np.mean(raw_conts)
std_raw_cont = np.std(raw_conts)
mean_raw_disc = np.mean(raw_discs)
std_raw_disc = np.std(raw_discs)
                         
sari = SARI(pKi=my_pKi, smiles=my_smiles, fingerprints='maccs')
sari.compute_sari(mean_raw_cont=mean_raw_cont, std_raw_cont=std_raw_cont,
                  mean_raw_disc=mean_raw_disc, std_raw_disc=std_raw_disc)
>>> 0.42

You can also pass a precomputed similarity matrix:

sari = SARI(pKi=pKi, sim_matrix=precomputed_similarity_matrix)

RMODI

RMODI is a function and takes a distance matrix in square form, and a list of float, as input.

from rogi import RMODI
RMODI(Dx=square_dist_matrix, Y=Y)
>>> 0.42

The delta values used by default is 0.625, but can be changed with the delta argument:

from rogi import RMODI
RMODI(Dx=square_dist_matrix, Y=Y, delta=0.5)
>>> 0.21

MODI

MODI is a function and takes a distance matrix in square form, and a list of binary labels (0 and 1), as input.

from rogi import MODI
MODI(Dx=square_dist_matrix, Y=Y)
>>> 0.42

Citation

If you make use of the rogi package in scientific publications, please cite the following article:

@misc{rogi,
      title={Roughness of molecular property landscapes and its impact on modellability}, 
      author={Matteo Aldeghi and David E. Graff and Nathan Frey and Joseph A. Morrone and 
              Edward O. Pyzer-Knapp and Kirk E. Jordan and Connor W. Coley},
      year={2022},
      eprint={2207.09250},
      archivePrefix={arXiv},
      primaryClass={q-bio.QM}
      }

If you use SARI, please also cite:

@article{sari,
         title={SAR Index: Quantifying the Nature of Structure−Activity Relationships},
         author={Peltason, Lisa and Bajorath, J\"urgen},
         journal={J. Med. Chem.},
         publisher={American Chemical Society},
         volume={50},
         number={23},
         pages={5571--5578},
         year={2007}
         }

If you use MODI, please also cite:

@article{modi,
         title={Data Set Modelability by QSAR},
         author={"Golbraikh, Alexander and Muratov, Eugene and Fourches, Denis and
                 Tropsha, Alexander"}
         journal={J. Chem. Inf. Model.},
         publisher={American Chemical Society},
         volume={54},
         number={1},
         pages={1--4},
         year={2014}
         }

If you use RMODI, please also cite:

@article{rmodi,
         title={Regression Modelability Index: A New Index for Prediction of the
                Modelability of Data Sets in the Development of QSAR
                Regression Models},
         author={Luque Ruiz, Irene and G\'omez-Nieto, Miguel \'Angel},
         journal={J. Chem. Inf. Model.},
         publisher={American Chemical Society},
         volume={58},
         number={10},
         pages={2069--2084},
         year={2018}
         }

About

Measures of roughness for molecular property landscapes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages