The coverage command calculates the coverage -- percentage of features present in each sample over a pre-defined group of features -- of a profile.
woltka coverage -i input.biom -m mapping.txt -o output.biom
A typical use case is to assess the likelihoods of presence of metabolic pathways in each organism or community. Because a pathway consists of multiple chemical reactions or functional genes connected to each other, the presence of some of them (even with high abundance) in the sample does not necessarily suggest that the entire pathway is viable. Only when all or a large proportion of them are found can we be more confident about this hypothesis.
In this example, the input profile (sample) is a table of genes:
Feature ID | Sample 1 | Sample 2 | Sample 3 | Sample 4 |
---|---|---|---|---|
plsC | 51 | 49 | 113 | 34 |
fruK | 83 | 128 | 160 | 41 |
panE | 0 | 53 | 0 | 39 |
leuA | 111 | 262 | 232 | 77 |
... |
The mapping file (sample) defines the member features (genes) of each feature group (pathway) (each line can have arbitrary number of fields; field delimiter is <tab>):
Asparagine biosynthesis | asnB | aspC | ||||
Biotin synthesis | bioA | bioB | bioD | bioF | ||
NAD biosynthesis II | hel | nudC | nadN | pnuE | nadR | nadM |
pyruvate decarboxylation | aceE | aceF | lpd | |||
... |
The output file (sample) is a table of coverage values (percentages) per sample per feature group (pathway):
Feature ID | Sample 1 | Sample 2 | Sample 3 | Sample 4 |
---|---|---|---|---|
Biotin synthesis | 50.0 | 50.0 | 25.0 | 37.5 |
GDP-D-rhamnose biosynthesis | 20.0 | 80.0 | 20.0 | 80.0 |
L-glutamine degradation I | 100.0 | 100.0 | 50.0 | 0.0 |
Sucrose biosynthesis I | 20.0 | 20.0 | 20.0 | 20.0 |
... |
Note: The "coverage" computed by Woltka is not the same as those by HUMAnN2 (whether the pathway is present) and HUMAnN3 (how likely the pathway is present), although the usage and interpretation may be comparable.
With parameter --threshold
or -t
followed by a percentage (e.g., 80
), the output coverage table will display binary results, with "1" representing coverage above or equal to this threshold and "0" being coverage below this threshold.
With flag --count
or -c
, the program will report the number of member features of a group present in a sample, instead of the percentage. Note: This will override --threshold
.
One can supply a mapping of feature groups to their names by --names
or -n
, and these names will be appended to the coverage table as a metadata column ("Name").
The coverage command will treat any feature count -- as low as 1 -- as the evidence of the feature's presence. False positives may be introduced if the profile has many noises. One may consider filtering the profile prior to running this command. Woltka provides a per-sample feature abundance filtering function, in addition to the multiple filtering functions implemented in the QIIME 2 plugin feature-table.