Script developed to build an interactive molecular similarity network to visualize Tanimoto similarity between molecules in a dataset.
-
pandas - a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.
-
NumPy - the fundamental package for array computing with Python
-
RDKit - Open source toolkit for cheminformatics
-
NetworkX - a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
-
Matplotlib - a comprehensive library for creating static, animated, and interactive visualizations in Python.
-
pyvis - Interactive network visualizations.
Libraries were used in a Miniconda3 environment using python 3.6.13
Miniconda3: Installation
pandas:
conda install -c anaconda pandas
numpy
conda install -c anaconda numpy
RDKit
conda install -c rdkit rdkit
NetworkX
conda install -c anaconda networkx
Matplotlib
conda install -c conda-forge matplotlib
pyvis
conda install -c conda-forge pyvis
Download the code and unzip it on the desirable directory
To run use the following command:
python similarityNetwork.py
-
The dataset layout should be in the format 'smiles' 'molecule_name' as presented in the dataset_ds.smi
-
The threshold may be changed in line 40 as it follows
Tc = DataStructs.TanimotoSimilarity(fps[i], fps[j])
**if Tc >= 0.3:**
g.add_edge(smiles[i], smiles[j], length=1000)
This script has been elaborated using as references the following articles and codes:
-
Draw molecular network on Jupyter notebook with rdkit and cytoscape.js - code
-
Molecular similarity network with visualised structures - code
- Author: Brenda Ferrari (brendaferrari)
- Co-author: Camilo Lima (limacamilo)
Social preview original photo by Brenda Ferrari (brendaferrari)