-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi,
I am trying to run truvari on the hapdiff unphased variants VCF (produced by the haplotype resolved HG002 assemblies) against the HG002 benchmarking VCF. I have given the commands used and the links to the public datasets below:
Hapdiff command:
singularity exec --bind $DD_DIR hapdiff_0.9.sif hapdiff.py --reference $DD_DIR/chm13_v2.fa --pat $DD_DIR/hg002v1.0.1.pat.fasta.gz --mat $DD_DIR/hg002v1.0.1.mat.fasta.gz --out-dir $DD_DIR/hapdiff -t 20
Links to the pat and mat assemblies:
hg002v1.0.1.pat.fasta.gz - https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.pat.fasta.gz
hg002v1.0.1.mat.fasta.gz - https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.mat.fasta.gz
Truvari command:
truvari bench -b CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz -c /projects/rsaju_prj/LongReadAssembly-test/hapdiff/hapdiff/hapdiff_unphased.vcf.gz -o output/
Links to the base dataset:
base dataset - https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_HG002_DraftBenchmark_defrabbV0.015-20240215/CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz
comparison dataset - produced by the hapdiff using the command above
Unfortunately, the precision, recall and F1 scores are low(~0.5) when it should be around 0.9? I tried using the latest HG002 benchmarks and good quality HG002 haplotype resolved assemblies available. Please find the summary.json produced by the truvari attached with this issue.
summary.json
Any idea what is going on and why are the scores so low? Any insights on this would be really helpful!
Thanks,
Riya