-
Notifications
You must be signed in to change notification settings - Fork 60
Task: get_dnaa
This downloads a set of genes from uniprot, by default searching for dnaA genes. It filters by checking for dnaA (or any other regex supplid by the user) in the name, for sequence length, and only takes one sequence per species. The remaining amino acid sequences are reverse translated into nucleotide sequences (so that they can be used with promer when running fixstart). This generates the default set of dnaA genes used by Circlator.
The general usage is
circlator get_dnaa [options] <outprefix>
There are the following options:
-
--min_length INT
: minimum length in amino acids. Default: 333. -
--max_length INT
: maximum length in amino acids. Default: 500. -
--uniprot_search STRING
: Uniprot search term. Default: dnaa. -
--name_re STRING
: Each sequence name must match this regular expression. Default: dnaa. -
--name_re_case_sensitive
: Do a case-sensitive match to regular expression given by --name_re. Default is to ignore case.
The FASTA file of genes is called outprefix.nucleotides.fa
. The amino acid FASTA file downloaded from uniprot is called outprefix.aa.fa
. A log file called outprefix.log
is written that has information on why sequences were removed. It is tab-delimited with two columns. The first column gives the reason for removing the sequence and the second column has the sequence name.