Skip to content

clhaga/pycomsia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Py-CoMSIA: Pythonic CoMSIA 3D QSAR

comsia.py is a Python implementation of Comparative Molecular Similarity Indices Analysis (CoMSIA), a 3D Quantitative Structure-Activity Relationship (QSAR) method. This tool allows you to analyze molecular fields and predict biological activities based on molecular structures.

Citing Py-CoMSIA

If you use Py-CoMSIA in your work, please cite our publication:

Haga, C. L., Le, C. N., Yang, X. D., & Phinney, D. G. (2025). Py-CoMSIA: An Open-Source Implementation of Comparative Molecular Similarity Indices Analysis in Python. Pharmaceuticals, 18(3), 440. https://doi.org/10.3390/ph18030440

https://www.mdpi.com/1424-8247/18/3/440

Features

  • CoMSIA Field Calculation: Calculates steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor fields.
  • PLS Regression: Utilizes Partial Least Squares (PLS) regression for building QSAR models.
  • Flexible Input: Supports both CSV files (SMILES and activity data) and pre-aligned SDF files.
  • Grid-Based Analysis: Configurable grid resolution and padding for field calculations.
  • Field Selection: Allows you to select specific fields for analysis.
  • Visualization: (Optional) Visualization of the CoMSIA fields and PLS results.
  • Prediction: Predict activities for new compounds based on the trained model.
  • Column filtering: option to filter out columns with low variance.

Installation

  1. Clone the repository:

    git clone https://github.com/clhaga/pycomsia
    cd pycomsia
  2. Install dependencies:

    pip install -r requirements.txt 

Usage

python comsia.py --train_file <train_file> [options]

Arguments

--train_file (required): Path to the training data. Can be a CSV file with SMILES and activity data or an SDF file containing pre-aligned molecules and activity data.

--predict_file: Path to the input CSV or SDF file for prediction.

--sdf_activity: Activity to use for SDF file. Required if using an SDF file.

--grid_resolution: Resolution of the grid used for field calculation. (default: 1.0)

--grid_padding: Padding of the grid used for field calculation. (default: 3.0)

--fields: Fields to use for analysis. Options: steric, electrostatic, hydrophobic, donor, acceptor, all. (default: all)

--num_components: Number of components for PLS analysis. (default: 12)

--column_filter: Column filtering. (default: 0.0)

--disable_visualization: Disable visualization. (default: False)

Data Format

CSV: One molecule per row. A column for SMILES strings. A column for the activity data.

SDF: Molecules should be pre-aligned. The SDF file must contain a property field corresponding to the activity data. Use --sdf_activity to specify the property name.

Tests from publication

To run the examples from the publication, simply execute the following:

python comsiatest.py

Molecule Imager

Creates a png file of molecules in an SDF file with IUPAC names (if available).

python moleculeimager.py SDF_file_name.sdf

About

A pythonic implementation of CoMSIA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages