MonoSeqCP

Project description

MonoSeqCP is a multimodal, monomer-level, transformer-based framework for predicting membrane permeability of cyclic peptides. The model integrates multiple monomer-level representations, including physicochemical descriptors, fingerprint-based features, and connectivity information, and explicitly accounts for cyclic rotational invariance.

This repository contains the code and analysis pipelines used in an ongoing research project at Molecular AI, AstraZeneca, Gothenburg, Sweden.

Repository structure

'scripts/' – Command-line scripts for feature generation, training, evaluation, and saliency analysis
'notebooks/' – Data preprocessing and splitting, result visualization and saliency result analysis
'data/' – Dataset location (not included in the repository)
'results/' – Generated outputs (not included)
'results/plots/' – Generated figures (not included)

Installation

Clone and install (pip)

git clone https://github.com/MolecularAI/MonoSeqCP
cd MonoSeqCP

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Install PyTorch

PyTorch is installed separately to support different CPU/GPU configurations.

CPU-only (works everywhere)

pip install torch torchvision torchaudio

GPU / HPC note

On many HPC systems, CUDA is provided via environment modules. Load the CUDA module recommended on your cluster before installing PyTorch, then install PyTorch.

module load CUDA/12.1.1
pip install torch torchvision torchaudio

Verify installation:

python -c "import torch; print(torch.__version__, torch.cuda.is_available(), torch.version.cuda)"

Exact environment used

An exact snapshot of the author environment is provided in requirements-author-freeze.txt. This file is primarily for reference and may be system- or HPC-specific.

Author torch/CUDA setup:

torch 2.4.0 (CUDA 12.1)

Installed using:

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 \
  --index-url https://download.pytorch.org/whl/cu121

Data

Due to licensing restrictions, the datasets used in this project are not included in the repository.

The required datasets can be downloaded from the CycPeptMPDB database and from an external benchmark repository. Detailed download instructions, including which subsets to select, are provided in data/README.md.

Quickstart / How to reproduce

All scripts are executed from the repository root.

High-level workflow:

Preprocess data and generate dataset splits using:
- 'notebooks/dataset.ipynb'
- 'notebooks/dataset_bench.ipynb'
Generate input features using 'scripts/input_features2.py'
Train models using 'scripts/model_training2.py'
Evaluate trained models using 'scripts/eval1.py'

Optional:

Generate plots using 'notebooks/plots.ipynb'
Perform saliency analysis using 'scripts/saliency.py'
Analyse saliency results using 'notebooks/saliency.ipynb'

Exact command-line arguments for scripts are described in the script docstrings, together with instructions how to change specific variable values.

Citation

If you use this code, please cite the repository using the information provided in 'CITATION.cff'.

If you use data from the CycPeptMPDB database, please also cite:

Li J, Yanagisawa K, Sugita M, Fujie T, Ohue M, Akiyama Y.
CycPeptMPDB: A Comprehensive Database of Membrane Permeability of Cyclic Peptides.
Journal of Chemical Information and Modeling, 63(7):2240–2250, 2023.
https://doi.org/10.1021/acs.jcim.2c01573

If you use the benchmark splits from the external repository, please also cite:

Liu W, Li J, Verma CS, Lee HK. Code for systematic benchmarking of 13 AI methods for cyclic peptide permeability. GitHub, https://github.com/Gobliu/BenchmarkCycPeptMP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MonoSeqCP

Project description

Repository structure

Installation

Clone and install (pip)

Install PyTorch

CPU-only (works everywhere)

GPU / HPC note

Exact environment used

Data

Quickstart / How to reproduce

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
notebooks		notebooks
results		results
scripts		scripts
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements-author-freeze.txt		requirements-author-freeze.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MonoSeqCP

Project description

Repository structure

Installation

Clone and install (pip)

Install PyTorch

CPU-only (works everywhere)

GPU / HPC note

Exact environment used

Data

Quickstart / How to reproduce

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages