##### #####
# # ###### # # #### # # ###### # # # # ##### #### # # # #### # ###### #####
# # ## # # # ## ## # # # # # # # # ## # # # # # # # #
# #### ##### # # # # # # ## # ##### # ###### # # # # # # # # # # ##### # #
# # # # # # # # # # # # # # ##### # # # # # # # # # #####
# # # # ## # # # # # # # # # # # # # # ## # # # # # # #
##### ###### # # #### # # ###### ##### # # # # #### # # # #### ###### ###### # #
This is the repository for GenomeChronicler, the Personal Genome Project United Kingdom (PGP-UK) genomic report generation scripts.
- BAM file OR gVCF file
- VEP generated summary HTML file (optional)
Getting some test data (NA12878 from ENA, pre-mapped to GRCh38, and the respective reference) and store this in your current folder:
# CRAM file for NA12878
curl -L -o na12878wxs.cram ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CEU/NA12878/alignment/NA12878.alt_bwamem_GRCh38DH.20150718.CEU.low_coverage.cram
# gVCF for NA12878
# TODO...
# HG38 Human Reference
curl -L -o hg38.fa ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
- Downloading pre-packaged GenomeChronicler from Github Packages
singularity pull docker://ghcr.io/pgp-uk/genomechronicler:latest
- Converting data to BAM format
singularity exec GenomeChronicler_latest.sif samtools view -@ 8 -T hg38.fa -b -o NA12878wxs.bam na12878wxs.cram
Running GenomeChronicler on the data
singularity run GenomeChronicler_latest.sif --bamFile=NA12878wxs.bam --GATKthreads 8
Note: you can use
singularity shell
to open an interactive linux to run commands directly
- Downloading pre-packaged GenomeChronicler from Github Packages
docker pull ghcr.io/pgp-uk/genomechronicler:latest
- Converting data to BAM format
docker run -v $PWD:/d/ genomechronicler samtools view -@ 8 -T /d/hg38.fa -b -o /d/NA12878wxs.bam /d/na12878wxs.cram
Running GenomeChronicler on the data
# With a GVCF
docker run -v $PWD:/d/ genomechronicler genomechronicler --vcfFile=/GenomeChronicler/NA12878wxs.g.vcf --resultsDir /GenomeChronicler/out2 --GATKthreads 8
# With a BAM
docker run -v $PWD:/d/ genomechronicler genomechronicler --bamFile=/d/NA12878wxs.bam --resultsDir /d/out --GATKthreads 8
Note: you can use
docker run --rm -it -v $PWD:/d/ genomechronicler bash
to open an interactive linux to run commands directly.
Option | Requirement | Description |
---|---|---|
--bamFile | REQUIRED (if no gVCF) | The path to a BAM file that has been preprocessed through markDuplicates and VariantQualityScoreRecalibration. This can be obtained by running the first step of the Sarek nextflow pipeline, or through other means that do respect the general principles of the GATK Variation Calling Best Practices workflow. Note that no variation calling is needed to run GenomeChronicler. |
--vcfFile | REQUIRED (if no BAM) | The path to a gVCF file produced by GATK or the GRCh38 reference genome. This can be obtained by running all steps from the Sarek nextflow pipeline, or through other means that do respect the general principles of the GATK Variation Calling Best Practices workflow. This avoids the need to run the GATK with GC. |
--vepFile | OPTIONAL | For the summary tables to appear in the report, a VEP summary HTML file must be provided. This will likely be generated if the data is from whole genome sequencing and variants were called (e.g. by running all the germline calling steps of the Sarek nextflow pipeline or other GATK Best Practices based workflow). If this isn't provided, summary tables and plots will automatically be excluded from the final report. |
--resultsDir | OPTIONAL | For setting the absolute path of the results folder to be produced when running GenomeChronicler. |
--customTemplate | OPTIONAL | For customising the output report, set this variable to the path of a custom LaTeX file to act as a template for the report. The default templates bundled with this software can also be found in the project github page. |
--GATKthreads | OPTIONAL | Number of threads to use for the GATK genotyping steps of this processing pipeline. |
There is a separate docker image that is designed for development:
-
Clone the repository:
-
Build the development docker image
docker build -f Dockerfile-dev -t gc_dev .
- Start docker environment and map local directory inside.
docker run --rm -it -v $PWD:/GenomeChronicler gc_dev bash
- Now any changes you make to the codebase in your host machine, will be immediately reflected in the docker image.
- Login using a token with access to Github Packages. See here for more info.
export CR_PAT=YOUR_TOKEN
echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin
- Build docker image and upload to Github Packages
docker buildx build --platform linux/amd64,linux/arm64 -t ghcr.io/pgp-uk/genomechronicler:latest .
docker buildx build --platform linux/amd64,linux/arm64 -t ghcr.io/pgp-uk/genomechronicler:v0.0.3 .
docker push ghcr.io/pgp-uk/genomechronicler:latest
docker push ghcr.io/pgp-uk/genomechronicler:v0.0.3