Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curate an initial peptide database #1

Open
jarvist opened this issue Sep 16, 2024 · 5 comments
Open

Curate an initial peptide database #1

jarvist opened this issue Sep 16, 2024 · 5 comments
Assignees
Labels

Comments

@jarvist
Copy link
Contributor

jarvist commented Sep 16, 2024

No description provided.

@jarvist jarvist closed this as completed Sep 16, 2024
@jarvist jarvist self-assigned this Sep 16, 2024
@jarvist
Copy link
Contributor Author

jarvist commented Sep 16, 2024

https://github.com/Frost-group/Nornour/blob/067016bd4aef46746e1a12cb2dad012116e996e2/0003-DRAMP-database/download.sh

# The DRAMP database offers perfectly formatted downloads of the data
# http://dramp.cpu-bioinfor.org/downloads/
#
# *Citation*:
# Shi G, Kang X, Dong F, Liu Y, Zhu N, Hu Y, Xu H, Lao X, Zheng H. DRAMP 3.0:
# an enhanced comprehensive data repository of antimicrobial peptides. Nucleic
# Acids Res. 2022 Jan 7;50(D1):D488-D496. PMID: 34390348

# (づ ᴗ _ ᴗ) づ ♡ - I love a good simple URL download
# wget "http://dramp.cpu-bioinfor.org/downloads/download.php?filename=download_data/DRAMP3.0_new/Antibacterial_amps.txt" -O Antibacterial_amps.txt

@jarvist
Copy link
Contributor Author

jarvist commented Sep 16, 2024

Nb: Data still quite unclean! '24..spacerO()hxwlUgimvfJyqtZ' characters all turning up

@jarvist
Copy link
Contributor Author

jarvist commented Sep 16, 2024

OK, this is now ready to use with a LSTM etc.; such as this super slick Javascript interface: https://cs.stanford.edu/people/karpathy/recurrentjs/

@jarvist jarvist reopened this Sep 16, 2024
@jarvist
Copy link
Contributor Author

jarvist commented Sep 17, 2024

& the RW Lexicon dataset from the paper added:

# Clark, S., Jowitt, T.A., Harris, L.K., Knight, C.G., Dobson, C.B., 2021. The
# lexicon of antimicrobial peptides: a complete set of arginine and tryptophan
# sequences. Commun Biol 4, 1–14. https://doi.org/10.1038/s42003-021-02137-7
# Computer readable data on Figshare:
# Clark, Sam (2021). The Lexicon of Antimicrobial Peptides: a Complete Set of
# Arginine and Tryptophan Sequences. figshare. Collection.
# https://doi.org/10.6084/m9.figshare.c.5104931.v1

@KamDB KamDB closed this as completed Sep 18, 2024
@KamDB KamDB reopened this Sep 18, 2024
@jarvist jarvist added the data label Sep 18, 2024
@KamDB KamDB closed this as completed Sep 23, 2024
@KamDB KamDB reopened this Sep 23, 2024
@KamDB
Copy link
Contributor

KamDB commented Sep 23, 2024

DRAMP ^^^
seems like a great start - 30260 entries
http://dramp.cpu-bioinfor.org/downloads/
sequence, activity and haemolytic activity all in one file for antibac and anticancer peptides
RW lexicon ^^^
simple but perhaps most straightforward to initially handle and model
https://pubmed.ncbi.nlm.nih.gov/34021253/
256 peptides - just arginine and tryptophan
sequence, activity and haemolytic activity all in one file.

APD3
https://aps.unmc.edu/home
https://academic.oup.com/nar/article/44/D1/D1087/2503090
Antimicrobial peptide database (As of today 4028 antibac peptides, 304 with anticancer activity - mostly natural)
Has a file in the downloads section with one letter amino acid sequences but has no associated IC values in the file, seems like you have to manually click through the database search to find potency (IC values given for a range of different cell lines and differ between each peptide).

dbAMP
35600 entries
https://awi.cuhk.edu.cn/~dbAMP/download2024.php
Has sequence data in one file, can't seem to find associated IC values in files but can search on database.
https://awi.cuhk.edu.cn/~dbAMP/analyze.php
Has a list of various machine learning algorithms for different aspects of antimicrobial peptide discovery
Hemofinder (https://awi.cuhk.edu.cn/~dbAMP/HemoFinder.php) seems particularly useful - can predict haemolytic activity and half-life of peptides.

DBAASP
DBAASP offers users to search for activities of peptides by particular target species and obtain the search results as the ranking list of activity values
Gram + 17049 entries, Gram - 17810 entries, cancer 3778 entries
equally has property and activity calculator for peptides - https://dbaasp.org/tools?page=property-calculation
https://dbaasp.org/tools?page=synergy-prediction - interestingly also has a calculator that predicts synergy between antibac peptides and conventional antibiotics.

InverPep
https://ciencias.medellin.unal.edu.co/gruposdeinvestigacion/prospeccionydisenobiomoleculas/InverPep/public/home_en
Specialised database of AMPs from invertebrates - 774 entries - not super useful, doesn't have any straightforwards lists but can still use to cross reference if needed .

CAMPR3
http://www.camp3.bicnirrh.res.in/index.php
Again doesn't seem to have a straightforward list but has a fairly large collection of AMPs (8164 AMP sequences) - notably has a patent database (2083 entries).
Also has a fair few machine learning tools for AMP prediction .

BaAMPs
http://baamps.it/
Interesting to consider > In the majority of chronic infections, microorganisms are rarely found as planktonic form. Rather, they gather in biofilm communities. A biofilm is constituted of single or multiple organism species, such as fungi, bacteria, and viruses, typically attached to biotic (e.g. tissues) or abiotic sites and encased in a self-secreted extracellular matrix. The treatment for biofilm infections is particularly challenging because bacteria in these conditions become refractory to antibiotic drugs.
237 peptides but have to search through database, looks like it has no text file.

CancerPPD
http://crdd.osdd.net/raghava/cancerppd/index.php
3491 peptide entries
Cancer specific, has text files with one letter amino acid sequences using both natural and unnatural amino acids however no associated IC values but can search them up in the database with corresponding cell lines.

Cybase
https://www.cybase.org.au/?page=assays
Specific focus on cyclic peptides
Small range of peptides but has sequence data as well as assay data with antibac, cancer and haemolytic activity.

Nice review on most of these antimicrobial peptide databases - https://academic.oup.com/database/article/doi/10.1093/database/baac011/6550847

Pore forming peptides - antibac and cancer
https://doi.org/10.1021/acs.jmedchem.4c00912
52 sequences ran through MD to get interaction energy and some tested in various bacteria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants