Skip to content

rye3355/gsp-ccdg-f3_conversion

Repository files navigation

gsp-ccdg-f3_conversion

Script to annotate and convert gsp-ccdg-f3 VDS files to standard VCF format.

Variant filter status annotated with Wenhan's suggested variant qc metrics, and additional [was_split, AS_VQSLOD, AS_lowqual, is_snv] fields also included.

A variant is labelled PASS if:

Annotations for AS_lowqual and AS_VQSLOD are pulled from Wenhan's annotation HailTables:

  • gs://fc-secure-9e3357c0-389c-41d7-94ee-56673db6b75f/ccdg_genomes_variant_qc_vqsr_alleleSpecificTrans_split.ht
  • gs://fc-secure-9e3357c0-389c-41d7-94ee-56673db6b75f/ccdg_genomes_variant_info_split.ht
  • gs://fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/ccdg_exomes_variant_qc_vqsr_alleleSpecificTrans_split.ht
  • gs://fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/ccdg_exomes_variant_info_split.ht

Example of how to run:

  • First, spin up workers on dataproc using Hail and autoscaling policy:
    • hailctl dataproc start [cluster-name] --num-workers 8 --autoscaling-policy=autoscale-8-200
    • Automatically increases/decreases number of preemptibles (--num-secondary-workers) to 200 and of non-preemptibles (--num-workers) to 8 as needed
    • From Tim: "It's not bad practice to make sure your ratio of preemptible to non-preemptible isn't larger than around 20:1"
  • Next, submit script with relevant flags
    • Running on ONLY CHROMOSOME 21 of exome dataset:
      • hailctl dataproc submit [cluster-name] gsp-ccdg-f3_vds-to-vcf.py --exomes --vds gs://fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/ccdg_exome_203k.vds/ --chr 21 --out gs://bgen-temp/ccdg_exome_203k/ccdg_exome_203k
    • Running on CHROMOSOME M, X, Y of genome dataset:
      • hailctl dataproc submit [cluster-name] gsp-ccdg-f3_vds-to-vcf.py --vds gs://fc-secure-9e3357c0-389c-41d7-94ee-56673db6b75f/ccdg_genome_136k.vds/ --chr M,X,Y --out gs://bgen-temp/ccdg_genome_136k/ccdg_genome_136k
    • Running on ALL CHROMOSOMES of genome dataset:
      • hailctl dataproc submit [cluster-name] gsp-ccdg-f3_vds-to-vcf.py --vds gs://fc-secure-9e3357c0-389c-41d7-94ee-56673db6b75f/ccdg_genome_136k.vds/ --out gs://bgen-temp/ccdg_genome_136k/ccdg_genome_136k
    • Running on ALL CHROMOSOMES of exome dataset:
      • hailctl dataproc submit [cluster-name] gsp-ccdg-f3_vds-to-vcf.py --exomes --vds gs://fc-secure-7e69c896-d6c0-4a4e-8490-42cb2d4fdebf/ccdg_exome_203k.vds/ --out gs://bgen-temp/ccdg_exome_203k/ccdg_exome_203k
    • Use --help for more info

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages