-
Notifications
You must be signed in to change notification settings - Fork 80
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* 2024.09 * addressing @charles-cowart comments * update based on @qiyunzhu recommendations
- Loading branch information
Showing
12 changed files
with
102 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
62 changes: 62 additions & 0 deletions
62
qiita_pet/support_files/doc/source/processingdata/woltka_pairedend.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
Wolka and Bowtie2 using Read Pairing Schemes | ||
============================================ | ||
|
||
Benchmarks created by Qiyun Zhu (@qiyunzhu) on Aug 1, 2024. | ||
|
||
Summary | ||
------- | ||
|
||
I tested alternative read pairing schemes in the analysis of shotgun metagenomic sequencing data. Sequencing reads were aligned against a reference microbial genome database as unpaired or paired, with or without singleton and/or discordant alignments suppressed. A series of synthetic datasets were used in the analysis. | ||
|
||
The results reveal that treating reads as paired is always advantageous over unpaired. Suppressing singleton alignments further increases the accuracy of results, despite the cost of lower mapping rate. Suppressing discordant alignments has no obvious impact on the result. Regardless of accuracy, the downstream community ecology analyses are not obviously impacted by the choice of parameters. | ||
|
||
Therefore, I recommend the general adoption of paired alignments as a standard procedure. I also endorse suppressing singleton and discordant alignments, but note the favor of further tests on whether they may reduce sensitivity with complex communities. | ||
|
||
Alignment parameters | ||
-------------------- | ||
|
||
Sequencing data were aligned using Bowtie2 v2.5.1 in the “very sensitive” mode against the WoL2 database. They were treated as either unpaired or paired-end: | ||
|
||
- SE: Reads are treated as unpaired (Bowtie2 input: -U merged.fq) | ||
- PE: Reads are treated as paired (Bowtie2 input: -1 fwd.fq, -2 rev.fq) | ||
- PE.NU: flags `--no-exact-upfront --no-1mm-upfront`. | ||
|
||
Resulting alignment files (SAM format) were processed by Woltka v0.1.6 using default parameters to generate OGU tables. | ||
|
||
Synthetic data | ||
-------------- | ||
|
||
Five synthetic datasets were generated with 25 samples each consisting of randomly selected WoL2 genomes. CAMISIM was executed to simulate 500 Mbp of 150 bp paired-end Illumina sequencing reads (appr. 3.3 million reads) per sample. The five datasets have different taxon count and distribution patterns. The result of one of the five datasets is displayed below. It consists of 400 taxa (more than others) and therefore is presumably the most realistic. However, all five results largely shared the same pattern. | ||
|
||
The results of the five Bowtie2 parameter sets were compared using nine metrics: | ||
|
||
Three metrics that only rely on each result. | ||
|
||
- Mapping rate (%) | ||
- Number of taxa | ||
- Entropy (i.e., Shannon index, but without subsampling) | ||
|
||
Six metrics that rely on comparing each result against the ground truth (higher is better): | ||
|
||
- Presence/absence-based: | ||
- Precision (fraction of discovered taxa that are true) | ||
- Recall (sensitivity) (fraction of true taxa that are discovered) | ||
- F1 score (combination of precision and recall) | ||
- Abundance-based: | ||
|
||
- Spearman correlation coefficient | ||
- Bray-Curtis similarity * | ||
- Weighted UniFrac similarity * | ||
|
||
* Note: Bray-Curtis and weighted UniFrac similarities were calculated after subsampling to a constant sum of taxon frequencies per sample. | ||
|
||
.. figure:: woltka_synthetic.png | ||
:align: center | ||
|
||
|
||
The results revealed: | ||
|
||
#. PE outperforms SE in all metrics. Most importantly, it reduces false positive rate (higher precision) while retaining mapping rate. Meanwhile, the sensitivity (recall) of identifying true taxa is not obviously compromised (note the y-axis scale). | ||
#. PE.NU the two additional parameters had minimum effect on the result and make the alignment step faster. This may suggest that the additional parameters are safe to use. | ||
|
||
Therefore, I would recommend adopting paired alignment in preference to unpaired alignment. I may suggest no mixing as it has improved accuracy, but the potential adverse effect of lower mapping rate may be further explored before making a compelling recommendation. Although not having a visible effect, no discordance may be added for logical coherency. |
Binary file added
BIN
+111 KB
qiita_pet/support_files/doc/source/processingdata/woltka_synthetic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters