Skip to content

MolecularAI/MonoSeqCP

Repository files navigation

MonoSeqCP

Project description

MonoSeqCP is a multimodal, monomer-level, transformer-based framework for predicting membrane permeability of cyclic peptides. The model integrates multiple monomer-level representations, including physicochemical descriptors, fingerprint-based features, and connectivity information, and explicitly accounts for cyclic rotational invariance.

This repository contains the code and analysis pipelines used in an ongoing research project at Molecular AI, AstraZeneca, Gothenburg, Sweden.


Repository structure

  • 'scripts/' – Command-line scripts for feature generation, training, evaluation, and saliency analysis
  • 'notebooks/' – Data preprocessing and splitting, result visualization and saliency result analysis
  • 'data/' – Dataset location (not included in the repository)
  • 'results/' – Generated outputs (not included)
  • 'results/plots/' – Generated figures (not included)

Installation

Clone and install (pip)

git clone https://github.com/MolecularAI/MonoSeqCP
cd MonoSeqCP

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Install PyTorch

PyTorch is installed separately to support different CPU/GPU configurations.

CPU-only (works everywhere)

pip install torch torchvision torchaudio

GPU / HPC note

On many HPC systems, CUDA is provided via environment modules. Load the CUDA module recommended on your cluster before installing PyTorch, then install PyTorch.

module load CUDA/12.1.1
pip install torch torchvision torchaudio

Verify installation:

python -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.version.cuda)"

Exact environment used

An exact snapshot of the author environment is provided in requirements-author-freeze.txt. This file is primarily for reference and may be system- or HPC-specific.

Author torch/CUDA setup:

torch 2.4.0 (CUDA 12.1)

Installed using:

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 \
  --index-url https://download.pytorch.org/whl/cu121

Data

Due to licensing restrictions, the datasets used in this project are not included in the repository.

The required datasets can be downloaded from the CycPeptMPDB database and from an external benchmark repository. Detailed download instructions, including which subsets to select, are provided in data/README.md.


Quickstart / How to reproduce

All scripts are executed from the repository root.

High-level workflow:

  1. Preprocess data and generate dataset splits using:

    • 'notebooks/dataset.ipynb'
    • 'notebooks/dataset_bench.ipynb'
  2. Generate input features using 'scripts/input_features2.py'

  3. Train models using 'scripts/model_training2.py'

  4. Evaluate trained models using 'scripts/eval1.py'

Optional:

  1. Generate plots using 'notebooks/plots.ipynb'
  2. Perform saliency analysis using 'scripts/saliency.py'
  3. Analyse saliency results using 'notebooks/saliency.ipynb'

Exact command-line arguments for scripts are described in the script docstrings, together with instructions how to change specific variable values.


Citation

If you use this code, please cite the repository using the information provided in 'CITATION.cff'.

If you use data from the CycPeptMPDB database, please also cite:

Li J, Yanagisawa K, Sugita M, Fujie T, Ohue M, Akiyama Y.
CycPeptMPDB: A Comprehensive Database of Membrane Permeability of Cyclic Peptides.
Journal of Chemical Information and Modeling, 63(7):2240–2250, 2023.
https://doi.org/10.1021/acs.jcim.2c01573

If you use the benchmark splits from the external repository, please also cite:

Liu W, Li J, Verma CS, Lee HK. Code for systematic benchmarking of 13 AI methods for cyclic peptide permeability. GitHub, https://github.com/Gobliu/BenchmarkCycPeptMP

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors