Skip to content

omicR for Linux using the command line. It creates fasta files, downloads genomes from NCBI using the refseq number, creates databases to run BLAST+, runs BLAST+ and filters these results to obtain the best match per sequence. These scripts can be used to run BLAST alignment of short-read (DArTseq data) and long-read sequences (Illumina, PacBio……

License

Notifications You must be signed in to change notification settings

BTalamantesBecerra/omicR_linux_commandline

Repository files navigation

omicR_linux command line

omicR creates fasta files, downloads genomes from NCBI using the refseq number, creates databases to run BLAST+, runs BLAST+ and filters these results to obtain the best match per sequence. These scripts can be used to run BLAST alignment of short-read (DArTseq data) and long-read sequences (Illumina, PacBio… etc). You can use reference genomes from NCBI, or any other genetic sequence that you would like to use as reference.

Introduction

omicR creates fasta files, downloads genomes from NCBI using the refseq number, creates databases to run BLAST+, runs BLAST+ and filters these results to obtain the best match per sequence.

These scripts can be used to run BLAST alignment of short-read (DArTseq data) and long-read sequences (Illumina, PacBio… etc). You can use reference genomes from NCBI, genomes from your private collection, contigs, scaffolds or any other genetic sequence that you would like to use as reference.

Requirements

• NCBI BLAST+ V4 or latest. (https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)

• Python V3 or latest (https://www.python.org/downloads/)

• Biopython (https://biopython.org/)

• omicR

Add these programs to your environment path variables.

Introduction If you are running omicR with an HPC computer, it likely that you know how to use a command line. For this purpose, I suggest that you only use 2 scripts to “create fasta files” and “filter”. As the steps of downloading, creating a database and running BLAST can take longer than running BLAST+ directly. The required input BLAST command line to run this filtering script is:

blastn -db [ ] -query [ ] -out [ ] -word_size [ ] -perc_identity [ ] -num_threads [ ] -outfmt ' 6 qseqid sacc stitle qseq sseq nident mismatch pident length evalue bitscore qstart qend sstart send gapopen gaps qlen slen’

For usage, please refer to the file "OmicR_User_guide.pdf" available in this repository.

If you use this script, please cite:

Berenice Talamantes-Becerra, Jason Carling, Arthur Georges. omicR: A tool to facilitate BLASTn alignments for sequence data, SoftwareX, Volume 14, 2021, 100702, ISSN 2352-7110, https://doi.org/10.1016/j.softx.2021.100702. Website: https://www.sciencedirect.com/science/article/pii/S2352711021000479

About

omicR for Linux using the command line. It creates fasta files, downloads genomes from NCBI using the refseq number, creates databases to run BLAST+, runs BLAST+ and filters these results to obtain the best match per sequence. These scripts can be used to run BLAST alignment of short-read (DArTseq data) and long-read sequences (Illumina, PacBio……

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages