Combined HiFi and Illumina SNP calling not possible? #8528
Replies: 1 comment
-
here the GATK error log |
Beta Was this translation helpful? Give feedback.
-
here the GATK error log |
Beta Was this translation helpful? Give feedback.
-
Hi all, we had raised this as an issue in June but did not get any reply:
Hello,
I am trying to use gatk/4.1.4.1 and picard/2.22.0 to do joint variant calling of PacBio HiFi reads and Illumina short-reads.
My pipeline is basically the recommended one (without base recalibration) where after HaplotypeCaller, I use GenomicsDBImport and GenotypeGVCFs and end up with the vcf file.
The HiFI reads have normal hifi phred scores up to 93 and the illumina are encoded with phred33 quality scores.
The output of the joint variant calling is strange and it causes issues when I try to use it with tools like plink and it makes me wonder if the joint variant calling went badly as well. This is even after variant filtration where I filter for QUAL<30 and QD<2
Here is an example of what the first few lines of my vcf calls look like. Notice the QUAL column that looks like Num,Num:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Illumina_sample0 sample1 sample2 sample_3 sample4 sample5 sample6 hifi_sample
Chr1 10608 . GCA G 174,34 PASS AC=2;AF=0.143;AN=14;BaseQRankSum=0.00;DP=55;ExcessHet=3.3579;FS=7.404;MLEAC=2;MLEAF=0.143;MQ=57.97;MQRankSum=0.00;QD=15.85;ReadPosRankSum=1.00;SOR=3.611 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 0/1:3,3:6:99:117,0,117 0/0:5,0:5:15:0,15,141 0/1:3,2:5:71:71,0,120 0/0:3,0:3:9:0,9,113 0/0:7,0:7:21:0,21,190 0/0:5,0:5:15:0,15,175 0/0:20,0:20:60:0,60,900
Chr1 10616 . G A 903,42 PASS AC=8;AF=0.571;AN=14;BaseQRankSum=0.00;DP=66;ExcessHet=0.7136;FS=9.277;MLEAC=9;MLEAF=0.643;MQ=59.80;MQRankSum=0.00;QD=20.08;ReadPosRankSum=1.00;SOR=0.251 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 0/0:10,0:10:30:0,30,367 1/1:0,5:5:15:141,15,0 0/1:2,3:5:75:110,0,75 0/0:3,0:3:9:0,9,113 1/1:0,8:8:24:232,24,0 1/1:0,7:7:21:224,21,0 0/1:14,6:20:99:210,0,570
I believe this can be reproduced with any illumina+hifi samples. Are you aware of this problem? Could the snp calling be wrong because of the different phred score encoding? What do you suggest to do in these cases?
I have read your response in this issue:
https://gatk.broadinstitute.org/hc/en-us/community/posts/360072716972-Variant-calling-with-PacBio-HiFi-reads and have used minimap2 in a similar way. (minimap2 -acyYL --secondary=no --MD --eqx -x asm20 -k 20)
Any advice on this from the community?
Beta Was this translation helpful? Give feedback.
All reactions