Woltka is deterministic. Given the same input files and parameters, it always produces the identical output files. There is no "seed" parameter.
The former. Woltka exhaustively captures all valid matches from the alignment file(s).
Woltka works the best with two CPU cores: one for file decompression and the other for classification. This happens automatically. See here for details.
Not at the moment. But you can run multiple Woltka instances on different subsets of samples and merge results. See here for details.
Yes. All input files for Woltka (alignments and databases) can be supplied as compressed in gzip, bzip2 or xz formats. Woltka will automatically recognize and process them.
Not out-of-the-box. But you can use SAMtools to extract BAM/CRAM files and directly "pipe" them into Woltka, like this ("-" represents stdin):
samtools view input.bam | woltka classify -i - -o output.biom
I ran woltka classify -i input.fastq ...
, and got an error saying it cannot determine alignment file format. Why?
Woltka takes alignment files as input, NOT original sequencing data (FASTQ, FASTA, etc.). You need to perform alignment on the sequencing data by yourself, such as:
bowtie2 -x db -U input.fastq -S output.sam
Then feed the resulting alignment(s) into Woltka.
woltka classify -i output.sam ... -o output.tsv
See here for details.
I ran woltka classify -i S01.sam ...
. The output feature table has a single column with an empty header. Why?
Woltka is designed to deal with multiple samples at once. If the input is a single alignment file, Woltka automatically treats it as a multiplexed file and attempts to extract individual samples out of it. But in case it is actually not -- Woltka will leave the sample ID empty.
If you really wants Woltka to process one sample at a time, you can do this (aside from manually adding sample ID to the output table):
woltka classify --no-demux -i S01.sam ... -o S01.tsv
The --no-demux
flag will tell Woltka not to try to demultiplex the alignment file. Instead, it will use the filename S01
as the only sample ID.
See here for details.
By default, cell values are feature frequencies, i.e., the numbers of sequencing reads assigned to individual features. Therefore they are absolute abundance.
Woltka provides multiple normalization features. For example, if you want to get relative abundance (fraction) instead of absolute abundance, you can do:
woltka classify ... --frac
See here for details.
Yes. See here for methods. For example:
woltka classify ... -c coords.txt --sizes . --scale 1k --digits 3 -o rpk.biom
woltka normalize -i rpk.biom --scale 1M -o tpm.biom
Yes. You can do:
woltka normalize -i orf.tsv -z coords.txt -s 1k -d 3 -o orf.rpk.tsv
See here for details.
Yes. The merge
command is for you. See here for details.
Yes. Add --name-as-id
to the command? See here for details.
By default, Woltka will keep all matches and divide them by the number of matches. Meanwhile, Woltka lets the user choose from multiple alternative behaviors.
See here for details.
They will not be counted in the output feature table, unless you specify the --unassigned
flag, in which case an extra feature unassigned
will be appear.
See here for details.
In this case, you will need to specify which field of a stratified feature should be collapsed, using parameter --field
or -f
followed by field index (starting from 1), otherwise the program will try to collapse the entire feature.
See here for details.