Skip to content
Richard Shipman edited this page Mar 24, 2025 · 3 revisions

Welcome to the Glycopeptide Sequence Finder Wiki

Welcome to the Glycopeptide Sequence Finder Wiki! This wiki is your centralized resource for exploring and understanding predicted glycoproteomes across various species. The goal is to provide clear, detailed, and accessible documentation for researchers and scientists working with glycoproteomics data in lesser studied organisms.

About This Wiki

This wiki serves as a comprehensive guide to:

  • Species Biology: Learn about the biological context and unique characteristics of each species.
  • Glycobiology Insights: Discover detailed information on glycosylation patterns and glycoprotein profiles.
  • Practical Use Cases: Explore how glycopeptide data can be applied in areas such as biomarker discovery, drug target identification, and comparative studies.
  • Sample Types & Preparation: Find protocols and best practices for preparing different sample types, including blood, tissue, and cultured cells.
  • Datasets: Access and review the datasets that underpin our predictions.
  • References: Consult key literature, tools, and resources that support the data and methodologies presented.

Navigation & Subpages

  • Species Pages: Each species has its own dedicated page with detailed sections on biology, glycobiology, sample preparation, and more.
  • Extraction Guides:
  • Linkage Types:

How to Navigate

  1. Browse the Wiki: Use the sidebar to navigate between species pages and other sections.
  2. Review Protocols: Check out the extraction guides and datasets to understand the experimental setups and data structures.
  3. Deep Dive into Glycobiology: Explore our detailed guide on glycoprotein linkage types to enhance your analysis and understanding.
  4. Contribute: We welcome contributions and feedback! Please refer to the contribution guidelines in our GitHub repository if you have suggestions or improvements.

Repository Reference

For more technical details and the latest updates on the Glycopeptide Sequence Finder, please visit our GitHub Repository.

We hope this resource enhances your research and deepens your understanding of glycopeptide sequences. Thank you for exploring this resource!

— Richard Shipman

Test Proteomes

Test proteome FASTA files from UniProt are available in the test_proteomes folder. Below is a list of species gathered. Only Swiss-Prot reviewed proteins were downloaded, and not every sequence available for a species is included.

I used these test proteomes to generate a zoo of glycopeptides under constrained conditions to fit into a GitHub repo. To build full zoo, remove constraints in batch processing script.

Species template: template_species.md

Common Name Scientific Name Taxon ID
Alpaca Vicugna pacos 30538
Amoeba Naegleria gruberi 5762
Anemone Nematostella vectensis 45351
Ant Camponotus floridanus 104421
Apple Malus domestica 3750
Arabidopsis Arabidopsis thaliana 3702
Aspergillus fumigata Aspergillus fumigata (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) 330879
Aspergillus nidulans Emericella nidulans (strain FGSC A4 / ATCC 38163 / CBS 112.46 / NRRL 194 / M139) 227321
Avocado Persea americana 3435
Banana Musa acuminata 4641
Barley Hordeum vulgare 4513
Bat Myotis lucifugus 59463
Black Cherry Prunus serotina 23207
Black Truffle Tuber melanosporum (strain Mel28) 656061
Blood Fluke Schistosoma mansoni 6183
Brine Shrimp Artemia franciscana 6661
Brown Alga Ectocarpus siliculosus 2880
Bushbaby Otolemur garnettii 30611
Camel Camelus bactrianus 9837
Candida albicans (Yeast, human pathogen) Candida albicans (strain SC5314 / ATCC MYA-2876) 237561
Cat Felis catus 9685
C. elegans Caenorhabditis elegans 6239
Chameleon Anolis carolinensis 28377
Charcoal Rot Macrophomina phaseolina (strain MS6) 1126212
Chicken Gallus gallus 9031
Chimpanzee Pan troglodytes 9598
Chinchilla Chinchilla lanigera 34839
C. jejuni Campylobacter jejuni 1951
Coffee Coffea arabica 13443
Cow Bos taurus 9913
Crocodile Crocodylus porosus 8502
Crytococcus Cryptococcus neoformans var. neoformans serotype D (strain JEC21 / ATCC MYA-565) 214684
Cytomegalovirus Human cytomegalovirus (strain Merlin) 295027
Corn Smut Mycosarcoma maydis 5270
Date Palm Phoenix dactylifera 42345
Debaryomyces hansenii (yeast) Debaryomyces hansenii (strain ATCC 36239 / CBS 767 / BCRC 21394 / JCM 1990 / NBRC 0083 / IGC 2968) 284592
Deer Tick Ixodes scapularis 6945
Diatom Thalassiosira pseudonana 35128
Dictyostelium Dictyostelium discoideum 44689
Dog Canis lupus familiaris 9615
Donkey Equus asinus 9796
Duck Cairina moschata 8855
Dugbe Virus Dugbe virus (isolate ArD44313) 766194
Ebola Zaire ebolavirus (strain Mayinga-76) 128952
Elephant Loxodonta africana (African Elephant) 9785
Fall Armyworm Spodoptera frugiperda (Fall Armyworm) 7108
Ferret Mustela putorius furo 9669
Fission Yeast Schizosaccharomyces japonicus (strain yFS275 / FY16936) 402676
Frog Xenopus laevis 8355
Fruit Fly Drosophila melanogaster 7227
Goat Capra hircus 9925
Gorilla Gorilla gorilla gorilla 9595
Grape Vitis vinifera 29760
Green Alga Chlamydomonas reinhardtii 3055
Guinea Pig Cavia porcellus 10141
Hamster Mesocricetus auratus 10036
Hemp Cannabis sativa 4565
HHV-1 Human herpesvirus 1 (strain 17) 10299
HIV-1 Human immunodeficiency virus type 1 group N (isolate YBF30) 388818
HIV-2 Human immunodeficiency virus type 2 subtype A (isolate BEN) 11714
Honeybee Apis mellifera 7460
Horse Equus caballus 9796
HRSV S-2 Human respiratory syncytial virus A (strain S-2) 410078
Human Homo sapiens 9606
Influenza B Influenza B virus (strain B/Lee/1940) 518987
Influenza C Influenza C virus (strain C/Ann Arbor/1/1950) 11553
JEV Japanese encephalitis virus (strain M28) 2555554
Kidney Bean Phaseolus vulgaris 3885
Kluyveromyces lactis (lactate processing yeast) Kluyveromyces lactis (strain ATCC 8585 / CBS 2359 / DSM 70799 / NBRC 1267 / NRRL Y-1140 / WM37) 284590
LASV Lassa virus (strain Mouse/Sierra Leone/Josiah/1976) 11622
LCMV Lymphocytic choriomeningitis virus (strain Armstrong) 11624
Lemur Microcebus murinus 30608
Macaque (Rhesus monkey) Macaca mulatta 9544
Maize Zea mays 4577
Measles virus Measles virus (strain Ichinose-B95a) 645098
Monkey (cynomolgus, crab-eating) Macaca fascicularis 9541
Mosquito (African malaria) Anopheles gambiae 7165
Mouse Mus musculus 10090
Naked Mole Rat Heterocephalus glaber 10181
Nematode (roundworm) Caenorhabditis briggsae 6238
Norovirus Norovirus (strain Human/NoV/United States/Norwalk/1968/GI) 524364
Octopus Octopus vulgaris 6645
Olive Olea europaea 4146
Opossum Monodelphis domestica 13616
Orange Citrus sinensis 2711
Orangutan Pongo abelii 9601
Oyster Magallana gigas 29159
Paramecium Paramecium tetraurelia 5888
Peach Prunus persica 3760
Penicillium Penicillium rubens (strain ATCC 28089 / DSM 1075 / NRRL 1951 / Wisconsin 54-1255) 500485
Pig (Domestic) Sus scrofa domesticus 9823
Platypus Ornithorhynchus anatinus 9258
Poplar Leaf Rust Fungus Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 747676
Potato Solanum tuberosum 4113
Psilocybe mushroom Psilocybe cubensis 181762
Pufferfish Takifugu rubripes 31033
Rabbit Oryctolagus cuniculus 9986
Rat Rattus norvegicus 10116
Red Alga Cyanidioschyzon merolae (strain NIES-3377 / 10D) 280699
Rice Oryza sativa subsp. japonica 39947
Rice Blast Fungus Pyricularia oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958) 242507
Rice Fish (Japanese) Oryzias latipes 8090
RVA Rotavirus A (isolate RVA/Monkey/South Africa/SA11-H96/1958/G3P5B[2]) 450149
RVB Rotavirus B (isolate RVB/Human/China/ADRV/1982) 10942
RVC Rotavirus C (isolate RVC/Human/United Kingdom/Bristol/1989) 31567
SARS-CoV SARS-CoV (Severe Acute Respiratory Syndrome Coronavirus) 694009
SFTSV SFTS phlebovirus (isolate SFTSV/Human/China/HB29/2010) 992212
Shark Callorhinchus milii 7868
Sheep Ovis aries 9940
Silk Moth Bombyx mori 7091
Silveira (Coccidioides Silveira strain) Coccidioides posadasii (strain RMSCC 757 / Silveira) 443226
Snake (Brown Eastern) Pseudonaja textilis 8673
Softshell Turtle Pelodiscus sinensis 13735
Spike Moss (lycophyte) Selaginella moellendorffii 88036
Sponge Amphimedon queenslandica 400682
Sorghum Sorghum bicolor 4558
Squirrel Ictidomys tridecemlineatus 43179
Starfish Patiria pectinifera 7594
Strawberry Fragaria ananassa 3747
Sugarcane Saccharum officinarum 4547
Sunflower Helianthus annuus 4232
Sycamore Platanus occidentalis 4403
Tea plant Camellia sinensis 4442
Tobacco Nicotiana tabacum 4097
Tilapia Oreochromis niloticus 8128
Tomato Solanum lycopersicum 4081
Trout (Brown) Oreochromis niloticus 8128
Turkey Meleagris gallopavo 9103
Urchin Strongylocentrotus purpuratus 7668
VZV Varicella-zoster virus (strain Dumas) 10338
Wasp (parasitoid) Nasonia vitripennis 7425
Watermelon Citrullus lanatus 3654
Wheat Triticum aestivum 4565
Whisk fern Psilotum nudum 3240
Wild Rice (North America) Oryza nivara 4536
WNV West Nile virus 11082
XMAn v2 Missense Homo sapians - Unknown Mutation Analysis (Human missense peptide library) Download at: https://github.com/lazarlab/XMAn-v2 9606
XMAn v2 Nonsense Homo sapians - Unknown Mutation Analysis (Human nonsense peptide library) Download at: https://github.com/lazarlab/XMAn-v2 9606
Yak Bos mutus grunniens 30521
Yeast (Budding, Baker's) Saccharomyces cerevisiae (strain ATCC 204508 / S288c) 559292
Yeast (Fission) Schizosaccharomyces pombe (strain 972 / ATCC 24843) 284812
Zebra Finch Taeniopygia guttata 59729
Zebrafish Danio rerio 7955
Zebu Bos indicus 9915
Zika Zika virus 64320