You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to understand what variants are included in the variants_long_table.csv file, and it seems that this is different depending on the sub-workflow used:
--platform nanopore: the long variants table includes only those that pass quality filters and are therefore injected onto the reference (i.e. those in *.pass.unique.vcf.gz files).
--platform illumina: the variants table includes both variants that pass quality filters and those that don't (incidentally the column FILTER does not reflect the actual filtering that happens later by ivar consensus, which I believe retains only AF > 0.75). I.e. it includes the variants in variants/ivar/*.vcf.gz rather than those in variants/ivar/consensus/bcftools/*.filtered.vcf.gz.
This can be a little confusing/inconsistent when analysing data from either platform.
On the other hand, I inferred (I hope correctly) that the # SNPs and # INDELs in the MultiQC "Variant calling metrics" section seem to be for the filtered SNPs/Indels only. In this sense, it seems consistent between the two platforms.
As for the # Missense variants, I assume again this is inconsistent between the two sub-workflows, because --platform illumina uses the unfiltered VCF with SnpEff, whereas --platform nanopore uses the filtered VCF.
Maybe this could be made more explicit in the documentation, or possibly modify the FILTER column in the long variants CSV to actually reflect whether the variants made it through to the final consensus sequence or not.
The text was updated successfully, but these errors were encountered:
Hi @tavareshugo ! I tried to extend the documentation regarding these files. Would you mind to let us know if this issue is solved with new dev changes? Thanks a lot for your feedback!
Description of feature
I was trying to understand what variants are included in the
variants_long_table.csv
file, and it seems that this is different depending on the sub-workflow used:--platform nanopore
: the long variants table includes only those that pass quality filters and are therefore injected onto the reference (i.e. those in*.pass.unique.vcf.gz
files).--platform illumina
: the variants table includes both variants that pass quality filters and those that don't (incidentally the columnFILTER
does not reflect the actual filtering that happens later byivar consensus
, which I believe retains only AF > 0.75). I.e. it includes the variants invariants/ivar/*.vcf.gz
rather than those invariants/ivar/consensus/bcftools/*.filtered.vcf.gz
.This can be a little confusing/inconsistent when analysing data from either platform.
On the other hand, I inferred (I hope correctly) that the
# SNPs
and# INDELs
in the MultiQC "Variant calling metrics" section seem to be for the filtered SNPs/Indels only. In this sense, it seems consistent between the two platforms.As for the
# Missense variants
, I assume again this is inconsistent between the two sub-workflows, because--platform illumina
uses the unfiltered VCF with SnpEff, whereas--platform nanopore
uses the filtered VCF.Maybe this could be made more explicit in the documentation, or possibly modify the
FILTER
column in the long variants CSV to actually reflect whether the variants made it through to the final consensus sequence or not.The text was updated successfully, but these errors were encountered: