Clarify documentation on variants tables (inconsistent between platforms) #336

tavareshugo · 2022-10-28T14:14:21Z

Description of feature

I was trying to understand what variants are included in the variants_long_table.csv file, and it seems that this is different depending on the sub-workflow used:

--platform nanopore: the long variants table includes only those that pass quality filters and are therefore injected onto the reference (i.e. those in *.pass.unique.vcf.gz files).
--platform illumina: the variants table includes both variants that pass quality filters and those that don't (incidentally the column FILTER does not reflect the actual filtering that happens later by ivar consensus, which I believe retains only AF > 0.75). I.e. it includes the variants in variants/ivar/*.vcf.gz rather than those in variants/ivar/consensus/bcftools/*.filtered.vcf.gz.

This can be a little confusing/inconsistent when analysing data from either platform.

On the other hand, I inferred (I hope correctly) that the # SNPs and # INDELs in the MultiQC "Variant calling metrics" section seem to be for the filtered SNPs/Indels only. In this sense, it seems consistent between the two platforms.
As for the # Missense variants, I assume again this is inconsistent between the two sub-workflows, because --platform illumina uses the unfiltered VCF with SnpEff, whereas --platform nanopore uses the filtered VCF.

Maybe this could be made more explicit in the documentation, or possibly modify the FILTER column in the long variants CSV to actually reflect whether the variants made it through to the final consensus sequence or not.

The text was updated successfully, but these errors were encountered:

tavareshugo · 2022-10-28T15:39:17Z

Maybe adding something like this to make_variants_long_table.py script at the end (not tested):

if args.variant_caller == 'ivar' or args.variant_caller == 'bcftools':
  merged_tables.loc[ merged_tables[“DP”] < 10 or merged_tables["AF"] < 0.75, “FILTER”] = “FAIL”

And then mention in the documentation that variants with "FAIL" are not used in the final consensus FASTA.

svarona · 2024-04-22T15:14:12Z

Hi @tavareshugo ! I tried to extend the documentation regarding these files. Would you mind to let us know if this issue is solved with new dev changes? Thanks a lot for your feedback!

tavareshugo · 2024-04-23T08:37:22Z

Yes, thank you @svarona, that seems much clearer now!

tavareshugo added the enhancement Improvement for existing functionality label Oct 28, 2022

drpatelh added this to the 2.6 milestone Mar 5, 2023

drpatelh modified the milestones: 2.6, 2.7 Mar 13, 2023

svarona self-assigned this Jan 24, 2024

svarona mentioned this issue Jan 24, 2024

Added option to add a custom annotation, clarified multiQC results and fixed issues #401

Merged

9 tasks

svarona closed this as completed Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify documentation on variants tables (inconsistent between platforms) #336

Clarify documentation on variants tables (inconsistent between platforms) #336

tavareshugo commented Oct 28, 2022

tavareshugo commented Oct 28, 2022

svarona commented Apr 22, 2024

tavareshugo commented Apr 23, 2024

Clarify documentation on variants tables (inconsistent between platforms) #336

Clarify documentation on variants tables (inconsistent between platforms) #336

Comments

tavareshugo commented Oct 28, 2022

Description of feature

tavareshugo commented Oct 28, 2022

svarona commented Apr 22, 2024

tavareshugo commented Apr 23, 2024