Reads-level based alignment to gene clusters of interest, e.g. bai operon or butyrate producing genes. Please refer to sunbeam_database for details. Make a diamond database from your proteins of interest fasta file and provide a text annotation file with the following columns: geneID, proteinID, ARO, taxon, weight.
To install, activate your conda environment (using the name of your environment) and use sunbeam extend
:
conda activate <i>sunbeamX.X.X</i>
sunbeam extend https://github.com/sunbeam-labs/sbx_gene_clusters.git
Now take UniRef50 database as an example. First download the uniref50.fasta into your current sunbeam_output/mapping/sbx_gene_family/databases/
.
mkdir -p sunbeam_output/mapping/sbx_gene_family/database/
wget ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz -P sunbeam_output/mapping/sbx_gene_family/database/
Be sure to update the config.yml
with the proper path.
To generate alignments,
sunbeam run --profile /path/to/project all_gene_clusters
- threads: Is the number of threads to run parallel processes with
- genes_fp: Is the path to the downloaded database
- evalue:
- alnLen:
- mismatch:
For sunbeam versions <3 or if sunbeam extend
isn't working, you can use git
directly to install an extension:
git clone https://github.com/sunbeam-labs/sbx_gene_clusters.git extensions/sbx_gene_clusters
and then include it in the config for any given project with:
cat extensions/sbx_gene_clusters/config.yml >> /path/to/project/sunbeam_config.yml