From 4b58c0c3f7d5062fa356d5adcb402b57a2b7ef2e Mon Sep 17 00:00:00 2001 From: VishnuRaghuram94 Date: Wed, 17 Feb 2021 23:56:32 -0500 Subject: [PATCH] Included instructions for alternate mummer workflow --- README.md | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index fd8d433..aa9bfa7 100644 --- a/README.md +++ b/README.md @@ -26,15 +26,15 @@ Due to Usearch's license, it cannot be provided with the conda installation. Ple For example (Use the version appropriate for your operating system): - curl "https://www.drive5.com/downloads/usearch11.0.667_i86linux32.gz" --output usearch11.0.667_i86linux32.gz + curl "https://www.drive5.com/downloads/usearch11.0.667_i86linux32.gz" --output usearch11.0.667_i86linux32.gz #Downloads usearch binary - gunzip usearch11.0.667_i86linux32.gz + gunzip usearch11.0.667_i86linux32.gz #Decompresses usearch binary - chmod 755 usearch11.0.667_i86linux32 + chmod 755 usearch11.0.667_i86linux32 #Changes permissions to executable - cp ./usearch11.0.667_i86linux32 $(dirname "$(which agrvate)") + cp ./usearch11.0.667_i86linux32 $(dirname "$(which agrvate)") #Copies usearch binary to the same directory as agrvate -***NOTE:*** Currently, only the 32-bit version of usearch is free to use. This version is not supported by WSL or MacOS (post-Catalina). We understand this limitation and are working on an alternate workflow. But for now it is recommended to use AgrVATE on Linux machines or older versions MacOS. +***NOTE:*** Currently, only the 32-bit version of usearch is free to use. This version is not supported by WSL or MacOS (post-Catalina). Therefore, it is recommended to use AgrVATE on Linux machines or older versions MacOS. If you are unable to run usearch, use the ```-m``` option to run MUMmer instead. However, please note that if there are large insertions/deletions in the agr-operon, MUMmer can split the alignment into 2 and the resulting extracted agr-operon will not be intact, in which case frameshift detection using snippy is not run. --- @@ -48,6 +48,9 @@ Camacho, C., Coulouris, G., Avagyan, V. et al. BLAST+: architecture and applicat * __[Snippy](https://github.com/tseemann/snippy)__ Seemann T (2015). Snippy: fast bacterial variant calling from NGS reads. https://github.com/tseemann/snippy + +* __[MUMmer](https://github.com/garviz/MUMmer)__ +S. Kurtz. et al (2004). Versatile and open software for comparing large genomes. Genome Biology, R12. https://doi.org/10.1186/gb-2004-5-2-r12 * __[HMMER](http://hmmer.org/)__ S.R. Eddy. Biological sequence analysis using profile hidden Markov models. http://hmmer.org/ @@ -73,7 +76,7 @@ Shen W, Le S, Li Y, Hu F (2016) SeqKit: A Cross-Platform and Ultrafast Toolkit f ├── gp2-operon_ref.gbk ├── gp3-operon_ref.gbk └── gp4-operon_ref.gbk - + └── mummer_ref_operon.fna ``` --- @@ -84,6 +87,7 @@ Shen W, Le S, Li Y, Hu F (2016) SeqKit: A Cross-Platform and Ultrafast Toolkit f * FLAGS: * ```-i```   Input S. aureus genome in FASTA format [alternate: ```--input```] * ```-t```   Does agr typing only (skips agr operon extraction and frameshift detection) [alternate: ```--typing-only```] + * ```-m```   Uses MUMmer dnadiff instead of usearch [alternate: ```--mummer```] * ```-f```   Force overwrite existing results directory [alternate: ```--force```] * ```-d```   Path to agrvate_databases (Not required if installed using Conda) [alternate: ```--databases```] * ```-h```   Print this help message and exit [alternate: ```--help```] @@ -102,7 +106,7 @@ Shen W, Le S, Li Y, Hu F (2016) SeqKit: A Cross-Platform and Ultrafast Toolkit f A new directory with suffix ```-results``` will be created where all the following files can be found -***NOTE:*** There are 15 possible kmers for each agr group per genome. The analyses will continue even if only one kmer matches a given agr-group but it should be noted that < 5 kmers matching leads to a low confidence agr-group call. Col 3 in ```fasta-summary.tab``` shows the number of kmers matched +***NOTE:*** **There are 15 possible kmers for each agr group per genome. The analyses will continue even if only one kmer matches a given agr-group but it should be noted that < 5 kmers matching leads to a low confidence agr-group call. Col 3 in ```fasta-summary.tab``` shows the number of kmers matched** * __**fasta-summary.tab:**__ @@ -157,6 +161,12 @@ A new directory with suffix ```-results``` will be created where all the follow * __**fasta-snippy/**__ All output files of [Snippy](https://github.com/tseemann/snippy) + +* __**fasta-mummer_log.txt:**__ + Standard output of [MUMmer dnadiff](https://github.com/garviz/MUMmer) + +* __**fasta-mummer/**__ + All output files of [MUMmer dnadiff](https://github.com/garviz/MUMmer) ### TROUBLESHOOTING @@ -176,7 +186,7 @@ The columns are ordered by how the processes are carried out. i.e col 1 is the f col 3: Databases check - If fail, the databases folder or the path to the databases was not valid. col 4: Outdir check - If fail, the results directory already exists and couldn't be overwritten. Use flag -f or --force. col 5: Agr typing - If fail, the Agr typing kmer search could not be performed. Check if blastn is installed correctly. - col 6: Usearch check - If fail, in-silico PCR was not performed by usearch. Check if usearch is installed correctly. + col 6: Operon check - If fail, in-silico PCR was not performed by usearch or agr operon search was not performed by mummer. Check if usearch/mummer is installed correctly. col 7: Snippy check - If fail, agr operon frameshift detection was not performed. Check if snippy is installed correctly. *If multiple assemblies are run, use this command from parent directory to output a consolidated report table for all samples*