Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify documentation on variants tables (inconsistent between platforms) #336

Closed
tavareshugo opened this issue Oct 28, 2022 · 3 comments
Closed
Assignees
Labels
enhancement Improvement for existing functionality
Milestone

Comments

@tavareshugo
Copy link
Contributor

Description of feature

I was trying to understand what variants are included in the variants_long_table.csv file, and it seems that this is different depending on the sub-workflow used:

  • --platform nanopore: the long variants table includes only those that pass quality filters and are therefore injected onto the reference (i.e. those in *.pass.unique.vcf.gz files).
  • --platform illumina: the variants table includes both variants that pass quality filters and those that don't (incidentally the column FILTER does not reflect the actual filtering that happens later by ivar consensus, which I believe retains only AF > 0.75). I.e. it includes the variants in variants/ivar/*.vcf.gz rather than those in variants/ivar/consensus/bcftools/*.filtered.vcf.gz.

This can be a little confusing/inconsistent when analysing data from either platform.

On the other hand, I inferred (I hope correctly) that the # SNPs and # INDELs in the MultiQC "Variant calling metrics" section seem to be for the filtered SNPs/Indels only. In this sense, it seems consistent between the two platforms.
As for the # Missense variants, I assume again this is inconsistent between the two sub-workflows, because --platform illumina uses the unfiltered VCF with SnpEff, whereas --platform nanopore uses the filtered VCF.

Maybe this could be made more explicit in the documentation, or possibly modify the FILTER column in the long variants CSV to actually reflect whether the variants made it through to the final consensus sequence or not.

@tavareshugo tavareshugo added the enhancement Improvement for existing functionality label Oct 28, 2022
@tavareshugo
Copy link
Contributor Author

Maybe adding something like this to make_variants_long_table.py script at the end (not tested):

if args.variant_caller == 'ivar' or args.variant_caller == 'bcftools':
  merged_tables.loc[ merged_tables[“DP”] < 10 or merged_tables["AF"] < 0.75, “FILTER”] =FAIL

And then mention in the documentation that variants with "FAIL" are not used in the final consensus FASTA.

@svarona
Copy link
Contributor

svarona commented Apr 22, 2024

Hi @tavareshugo ! I tried to extend the documentation regarding these files. Would you mind to let us know if this issue is solved with new dev changes? Thanks a lot for your feedback!

@tavareshugo
Copy link
Contributor Author

Yes, thank you @svarona, that seems much clearer now!

@svarona svarona closed this as completed Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants