-
Notifications
You must be signed in to change notification settings - Fork 10
B: Creating PanGenie input VCFs from haplotype‐resolved assemblies
Jana Ebler edited this page Aug 22, 2023
·
3 revisions
We have written a pipeline that calls variants from haplotype-resolved assemblies of human samples and generates a graph-VCF to be used as input to PanGenie. This pipeline is available here: https://bitbucket.org/jana_ebler/vcf-merging/src/master/pangenome-graph-from-assemblies/. The pipeline produces two ouput VCFs. A mulit-allelic graph-VCF and a bi-allelic callset-VCF formatted as described in detail in Section Genotyping variation nested inside of bubbles.
The graph-VCF can be used as input to PanGenie to genotype graph bubbles:
# run PanGenie (v3.0.0) preprocessing
PanGenie-index -v <graph-vcf> -r <reference-genome> -t 24 -o index
# run PanGenie (v3.0.0) on a specific sample (using 24 cores), produces genotyped VCF "pangenie_genotyping.vcf".
# to genotype multiple samples, run this command on each sample separately. PanGenie-index needs to be run only once.
PanGenie -f index -i <input-reads> -o pangenie -j 24 -t 24
The callset-VCF can then be used to convert the bubble genotypes into genotypes for all variant alleles nested inside of bubbles:
cat pangenie_genotyping.vcf | python3 convert-to-biallelic.py <callset-VCF> > pangenie_genotyping_biallelic.vcf
The script convert-to-biallelic.py
is provided here.