This is a simple command line utility to deal with FASTA formatted biological sequence collections, inspired by goalign and built in with bashly.
The idea is to have very simple operations (mainly using awk) so they can be executed on a per sequence basis, eliminating the need to load the whole file in memory. This allows the user to operate on very large FASTA files.
This tool is distributed as a shell script. So if you download the fastatools script it should work seamlessly on *NIX systems.
For now you can refer to the generated help message by using fastatools help or fastatools [command] help or the definition file.
This is designed to be pipeable, so the default IO is standard input and output. However for all commands the -ior --input flag can be used to specify an input and for most commands the -o or --output flag can be used to specify an output flag.
This is a short list presenting the avaiable fastatools commands.
count: Get the number of sequencesnames: Get names of sequenceslength: Get lengths of sequencesfreqs: Get character frequencies in sequences
select: Select sequences in FASTA file by namesubset: Select sequences in FASTA file by indexhead: Print first n sequencestail: Print last n sequencessubsite: Select specific sites in aligned sequences
upper: Transform sequences to uppercaselower: Transform sequences to lowercasepretty: Pretty print FASTA file, wrapping sequences to desired widthrc: Reverse complement sequences
rename: Rename sequences in FASTA fileaddid: Add an identified to each sequence name in a sequence name
split: Split a fasta file into several fasta files
completion: Generate BASH completion script (auto-generated by bashly)