-
Notifications
You must be signed in to change notification settings - Fork 39
The findmitoscaf subcommand
You can use this subcommand to search your fasta file (generated by MitoZ or any other assemblers) for mitogenomes.
$ mitoz findmitoscaf -h
usage: mitoz findmitoscaf [-h] --fastafile <file> [--fq1 <file>] [--fq2 <file>] --outprefix <STR>
[--workdir <STR>] [--thread_number <INT>] [--profiles_dir <STR>] [--slow_search]
[--filter_by_taxa] --requiring_taxa <STR> [--requiring_relax {0,1,2,3,4,5,6}]
[--min_abundance <float>] [--abundance_pattern <STR>] [--skip_read_mapping]
[--genetic_code <INT>]
[--clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}]
Search for mitochondrial sequences from input fasta file.
optional arguments:
-h, --help show this help message and exit
--fastafile <file> Input fasta file. Gzip supported. [required]
--fq1 <file> Input fastq 1 file. use this option if the headers of your '--fastafile' does NOT have
abundance information BUT you WANT to filter sequence by their sequencing abundances
[optional]
--fq2 <file> Input fastq 2 file. use this option if the headers of your '--fastafile' does NOT have
abundance information BUT you WANT to filter sequence by their sequencing abundances
[optional]
--outprefix <STR> output prefix
--workdir <STR> workdir [./]
--thread_number <INT>
thread number [8]
--profiles_dir <STR> Directory cotaining 'CDS_HMM/', 'MT_database/' and 'rRNA_CM/'.
[/home/gmeng/.conda/envs/mybase/envs/mitozEnv.test3.6/lib/python3.8/site-
packages/mitoz/profiles]
--slow_search By default, we firstly use tiara to perform quick sequence classification (100 times
faster than usual!), however, it is valid only when your mitochondrial sequences are >=
3000 bp. If you have missing genes, set '--slow_search' to use the tradicitiona search
mode. [False]
--filter_by_taxa filter out non-requiring_taxa sequences by mito-PCGs annotation to do taxa
assignment.[True]
--requiring_taxa <STR>
filtering out non-requiring taxa sequences which may be contamination [required]
--requiring_relax {0,1,2,3,4,5,6}
The relaxing threshold for filtering non-target-requiring_taxa. The larger digital means
more relaxing. [0]
--min_abundance <float>
the minimum abundance of sequence required. Set this to any value <= 0 if you do NOT
want to filter sequences by abundance [10]
--abundance_pattern <STR>
the regular expression pattern to capture the abundance information in the header of
sequence ['abun\=([0-9]+\.*[0-9]*)']
--skip_read_mapping Skip read-mapping step, assuming we can extract the abundance from seqid line. [False]
--genetic_code <INT> which genetic code table to use? 'auto' means determined by '--clade' option. [auto]
--clade {Chordata,Arthropoda,Echinodermata,Annelida-segmented-worms,Bryozoa,Mollusca,Nematoda,Nemertea-ribbon-worms,Porifera-sponges}
which clade does your species belong to? [Arthropoda]
About the input fasta file:
-
The sequence header lines of your input fasta files should have abundance information, for example,
>Congtig1 abun=38.2
. If your fasta file has abundance information, but does not match the regular expressionabun\=([0-9]+\.*[0-9]*)
, then you can modify the value of--abundance_pattern
, to make MitoZ extract the abundance information from your sequence header line.For example, if your sequence header lines look like
>Congtig1 coverage:38.2
, you can set--abundance_pattern 'coverage\:([0-9]+\.*[0-9]*)'.
-
However, if your sequence header lines do not have the abundance information, you can then set the
--fq1
and--fq2
options if you want to filter the sequences by abundance (you don't have to though). -
If you do not want to filter by abundance, set
--min_abundance 0
, and you do not need to set the--fq1
and--fq2
options.
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command