v3.1 - INS Reciprocal Overlap and New TruScore #92

ACEnglish · 2021-12-21T17:28:04Z

ACEnglish
Dec 21, 2021
Maintainer

Sequence resolved insertions (INS) have no span over the reference since they're placed between two bases in the reference. This makes it impossible to measure reciprocal overlap. Truvari now inflates INS SVs' positions by ± their SVLEN // 2, which gives their coordinates a span and allows reciprocal overlap to be calculated.

Figure 1 below is a diagram illustrating what this position inflation looks like.

In addition to this new measurement, Truvari's TruScore has been recalibrated to no-longer put more weight into the PctSeqSimilarity. These two changes allow better consideration of a variant-pair's distance as well as a more uniform distribution of TruScore between DEL/INS SV types. Figure 2A shows the TruScore distribution of version 3.0.1 using data provided by ticket #91. Figure 2B shows the same data's distribution with v3.1.

For this dataset, these changes don't affect the number of matching calls (e.g. same number of TPs). However, because --multimatch wasn't used here, at least 3 variants' "best" matching variant changed. These are at loci where the comparison calls had multiple representations near a base call. This is evident by looking at the number of TP-call_TP_gt calls and seeing that 3 more pairs now have a matching genotype, which speaks to the quality of this caller. However, it should be noted that there are no guarantees that every comparison SV set will have improved or identical results between v3.0.1 and v3.1. I can only say it probably shouldn't be worse.

You may have noticed that in plotting this, we subsetted v3.0.1 results to only the comparison's true positives with state == 'tp'. A new feature of v3.1 is that Truvari will annotate false positives/negatives with their closest matching call. This enables further analysis of benchmarking results to explore how the thresholds affect some calls with respect to flipping between TP/FP. See this discussion and this post for details

import joblib
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt

v301_data = joblib.load("results_301_seq/data.jl")
v310_data = joblib.load("results_31new_seq/data.jl")

p = sb.histplot(data=v301_data[v301_data["state"].isin(['tp'])].reset_index(), 
                x="TruScore", 
                hue="svtype", 
                multiple="dodge",
                hue_order=["DEL", "INS"],
                binwidth=5)
p.set(title="Fig2A - 3.0.1 TruScore", xlim=(0,100))
plt.show()

p = sb.histplot(data=v310_data.reset_index(), 
                x="TruScore", 
                hue="svtype", 
                multiple="dodge",
                hue_order=["DEL", "INS"],
                binwidth=5)
p.set(title="Fig2B - 3.1 TruScore", yscale="log")
plt.show()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.1 - INS Reciprocal Overlap and New TruScore #92

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

v3.1 - INS Reciprocal Overlap and New TruScore #92

ACEnglish Dec 21, 2021 Maintainer

Replies: 0 comments

ACEnglish
Dec 21, 2021
Maintainer