Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some queries about SViper #22

Open
prasundutta87 opened this issue Apr 19, 2022 · 3 comments
Open

some queries about SViper #22

prasundutta87 opened this issue Apr 19, 2022 · 3 comments

Comments

@prasundutta87
Copy link

Hi,

I am working with trios ONT data for which I have Illumina short read data as well. I am using SViper 2.0.0 for polishing my SV breakpoints. I have a few queries and I would be grateful if I could be helped with them.

  1. When I was testing the software with one set of sample, I found that most of the FAILs were FAIL5 ("The variant was polished away."). What is the reason behind this fail?

  2. What is meant by FAIL3 (The long read regions do not fit)? Can this please be elaborated?

  3. I am aware that no tags should be present. SViper skips the variants. With bcftools, I am getting the error that SKIP is not defined. If you are still developing the tool, can that please be added? Although I have changed the SVTYPE to INS, SViper checks the variant type by tags rather than SVTYPE.

  4. I observed that the the SViper score is put on the QUAL field. How is the score calculated and does it have any biological significance? Should I filter my SVs based on SViper score again? Is there a threshold based on which I should remove SVs? I actually filter my SVs using QUAL value of the variant caller (cuteSV). So, replacing this value with the score effects my pipeline.

Regards,
Prasun

@smehringer
Copy link
Owner

HI @prasundutta87,

thanks for your interest in SViper!
I try to maintain SViper but often I have troubles finding the time.

  1. When I was testing the software with one set of sample, I found that most of the FAILs were FAIL5 ("The variant was polished away."). What is the reason behind this fail?

This means that after polishing the variant is not visible anymore in the data (in the corrected long reads).

  1. What is meant by FAIL3 (The long read regions do not fit)? Can this please be elaborated?

This means that for the given variant, no proper region from the long read could be extracted. E.g. although a long read has a desired deletion of 200, the flanking regions of this deletion are mapped very poorly, or the mapping indicates a complex variant. SViper can only polish simple deletions and insertions.

  1. I am aware that no tags should be present. SViper skips the variants. With bcftools, I am getting the error that SKIP is not defined. If you are still developing the tool, can that please be added? Although I have changed the SVTYPE to INS, SViper checks the variant type by tags rather than SVTYPE.

I'll try to change this! But I can't promise that it will be in the next days.

  1. I observed that the the SViper score is put on the QUAL field. How is the score calculated and does it have any biological significance?

The score is computed here:

// Score computation
// -----------------
double error_rate = ((double)length(record.cigar) - 1.0)/ (config.flanking_region * 2.0);
double fuzzyness = (1.0 - error_rate/0.15) * 100.0;
variant.quality = std::max(fuzzyness, 0.0);
record.mapQ = variant.quality;

It does not have a biological significance! As far as I remember my own code, it was experimentally derived and proved to work well on manual inspection.

Should I filter my SVs based on SViper score again? Is there a threshold based on which I should remove SVs?

Unfortunately this is very hard to answer and heavily depends on your use case. In general I can say that you should filter out variants with a FAIL tag. Those are very unlikely to be true. But variants with a low score might just mean that the polishing didn't work well. I might not filter by the score but only regard this as a confidence score.

I actually filter my SVs using QUAL value of the variant caller (cuteSV). So, replacing this value with the score effects my pipeline.

Can you filter the SVs before polishing them with SViper? Otherwise I might need to see if I can add an option that does not overwrite the quality scores but adds them in the INFO field.

Best,
Svenja

@prasundutta87
Copy link
Author

Hi @smehringer ,

Thank you so much to answer my queries. Is there a document anywhere where the algorithm or working on SViper is mentioned anywhere. I am just trying to get my head around the algorithm (not the numerical/quantification bit, but the general concept of polishing). For example, when you say polishing the variant is not visible anymore in the corrected long reads, can this please be elaborated?

Thanks for the suggestion to use SViper after my final filtering.

Also, do we need to sort the BAMs by name? It seems to be specifically mentioned for utilities_merge_split_alignments tool. Currently I have coordinate sorted my BAMs.

This may be trivial, but the master version of SViper is 2.0.0, but the most recent version is 2.1.0. At least it shows in the help that it is 2.0.0. Could you kindly clarify this?

Regards,
Prasun

@smehringer
Copy link
Owner

Is there a document anywhere where the algorithm or working on SViper is mentioned anywhere.

I've send you an email with my thesis that hopefully can answer most of your questions.

Also, do we need to sort the BAMs by name? It seems to be specifically mentioned for utilities_merge_split_alignments tool. Currently I have coordinate sorted my BAMs.

For sviper, sorting by coordinate is fine (even required).
Only when using the utility utilities_merge_split_alignments you need to have the BAM sorted by names. But do you want to use this utility at all? It's an rather advanced utility I used in my validation pipelines.

This may be trivial, but the master version of SViper is 2.0.0, but the most recent version is 2.1.0. At least it shows in the help that it is 2.0.0. Could you kindly clarify this?

I'm very sorry for the confusion! This is a documentatio bug. I forgot to change the version in the help page. I'll correct this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants