GlycanAnalysisPipeline

Glycan conformations analysis pipeline for GlycoShape Database.

From every glycan simulated, a directory is made titled with the name of the particular glycan in GLYCAM condensed format. Within this directory is a multiframe PDB of the concatenated replicas of MD simulation, and a single frame MOL2 file. The GAP pipeline is then ran on these directories to create further subdirectories titled "output" and "clusters" which contain the outputs of both the PCA and GMM and the representative cluster structures, respectively.

Installation

conda create -n GAP python=3.10
conda activate GAP
pip install -r requirements.txt

modify config.py to set data_dir variable to the folder where we have all the simulations multiframe pdb and mol2 file for the molecule, the folder name should be the GLYCAM name of the glycan.

Running

python main.py && python recluster.py && python plot_dist.py && python save_frames.py

this will produce "clusters" and "output" folder in each glycan dir with required files for Database and Re-Glyco.

Note

The DB script then takes the structural information from these directories, coupled with APIs and other packages, to create the information necessary for the GDB. For the is, the DB directory contains subdirectories titled with the name of each glycan in IUPAC condensed format. Within these subdirectories are JSON files with the relecant nomeclature, chemical, and biological data of the glycan and an SVG file of the glycan 2D structure in SNFG format. Also located within this directory are further subdirectories containing the representative cluster structures in different naming formats, specifically CHARMM, GLYCAM, and PDB.

The final output database has format of dummy_database/. This directory format is used by Re-Glyco to build glycoproteins. The code for Re-Glyco is here

Citation

All of the data provided is freely available for academic use under Creative Commons Attribution 4.0 (CC BY-NC-ND 4.0 Deed) licence terms. Please contact us at elisa.fadda@soton.ac.uk for Commercial licence. If you use this resource, please cite the following papers:

Callum M Ives and Ojas Singh et al. Restoring Protein Glycosylation with GlycoShape Nat Methods (2024)..

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
dummy_data_dir/DManpa1-2DManpa1-OH		dummy_data_dir/DManpa1-2DManpa1-OH
dummy_database/Man(a1-2)Man		dummy_database/Man(a1-2)Man
lib		lib
.gitignore		.gitignore
DB_checker.py		DB_checker.py
Figure.jpg		Figure.jpg
GlycoShape_DB.py		GlycoShape_DB.py
GlycoShape_DB_backup.py		GlycoShape_DB_backup.py
GlycoShape_DB_bake.py		GlycoShape_DB_bake.py
GlycoShape_DB_static.py		GlycoShape_DB_static.py
LICENSE		LICENSE
README.md		README.md
clean.py		clean.py
config.py		config.py
iupac_glytoucan.py		iupac_glytoucan.py
main.py		main.py
plot_dist.py		plot_dist.py
recluster.py		recluster.py
requirements.txt		requirements.txt
run.sh		run.sh
save_frames.py		save_frames.py
v9_sugarbase.csv		v9_sugarbase.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlycanAnalysisPipeline

Installation

Running

Note

Citation

About

Releases

Packages

Contributors 2

Languages

License

Ojas-Singh/GlycanAnalysisPipeline

Folders and files

Latest commit

History

Repository files navigation

GlycanAnalysisPipeline

Installation

Running

Note

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages