OSDB is a pipeline for local database configuration and annotation.
Note 1: Please check links in dirDB_linksbefore run this pipeline. Make sure all links are valid.Note 2: It is common to meet403 forbiddenissues when you are running kegg-related scripts. Just terminate the current running scripts and re-run it again, and again, and again ... The scripts will not delete downloaded data. And finally you will successfully download them all :)last update: 2024/07/23
conda create -n OSDB -y
conda activate OSDB
#mamba is recommend
mamba install wget aria2 diamond=2.0.15 prodigal -y
#or
conda install wget aria2 diamond=2.0.15 prodigal -ygit clone https://github.com/jchenek/OSDB.gitStep 1: download general database
Will use aria2 to download all links (~24 h)
conda activate OSDB
sh /PATH/TO/one-step-DB-download-p1-general.sh /PATH/TO/YOUR/OSDB_dirStep 2: download nr database
Will use aria2 to download nr link (~24 h)
conda activate OSDB
sh /PATH/TO/one-step-DB-download-p2-nr.sh /PATH/TO/YOUR/OSDB_dirStep 3: download kegg database
Will use wget to download kegg items one-by-one (~4 weeks, if lucky)
To save time, only sequences with ko id will be downloaded.
conda activate OSDB
#data from this part is for enrichment analysis
sh /PATH/TO/DB_kegg/kegg_db_wget_p1_data_for_enricher.sh /PATH/TO/YOUR/OSDB_dir
#ko seqs will be download in this part
sh /PATH/TO/DB_kegg/kegg_db_wget_p2_for_ko_genes.sh
cd kegg_db_2
perl /PATH/TO/DB_kegg/kegg_db_wget_p2_1_for_ko_genes_download_seq.pl ko_genes.txt
#(opt) all seqs will be download in this part
sh /PATH/TO/DB_kegg/kegg_db_wget_p3_opt_for_all_genes.sh /PATH/TO/YOUR/OSDB_dir
perl /PATH/TO/DB_kegg/kegg_db_wget_p3_1_opt_for_all_genes_download_seq.pl all_genes.listStep 4: make database
Check scripts in dir 'DB_make'
Step 5: local annotation
Check scripts in dir 'annotation'
cazy: https://bcb.unl.edu/dbCAN2/download/
cog: https://ftp.ncbi.nih.gov/pub/COG/
tcdb: https://tcdb.org/download.php
merops (Peptidase Full-length): https://www.ebi.ac.uk/merops/download_list.shtml
swissprot: https://www.ncbi.nlm.nih.gov/home/download/
pfam-A: https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/
UniProtKB-TrEMBL: https://www.uniprot.org/help/downloads