Blue Lab pipeline

Tip: run R with R -q --vanilla --args <arguments> < script.R instead of using Rscript, so R will print out a full log of all commands and messages in your script.

Required R packages: argparser, SeqArray, SeqVarTools, SNPRelate, GENESIS

Running with the PBS job scheduler

From beluga1, use runRscript.sh. Use the -v flag to pass the name of the R script (after R=) and any arguments to the script (after args=).

qsub -N jobname -v R="myscript.R",args="--arg1 arg1 --arg2 arg2" runRscript.sh

To run a script by chromosome, use the -t flag to submit each chromosome as a separate task in an array job.

qsub -N ld_pruning -v R="/acct/sdmorris/code/assoc_pipeline/ld_pruning.R",args="myfile.gds --maf 0.01 --missing 0.01" -t 1-22 runRscript.sh

To run a script in parallel, use the -l flag to request multiple processors on a node, in addition to the --num_cores argument to the R script.

qsub -N sample_qc -v R="/acct/sdmorris/code/assoc_pipeline/sample_qc.R",args="myfile.gds --num_cores 12" -l nodes=1:ppn=12 runRscript.sh

Convert to GDS

Use the SeqArray package to convert files.

If you have plink files (BED/BIM/FAM), use seqBED2GDS. If you have VCF, use seqVCF2GDS.

Example for VCF: http://bioconductor.org/packages/release/bioc/vignettes/GENESIS/inst/doc/assoc_test_seq.html#convert-vcf-to-gds

Script for BED: plink_to_gds.R

QC

Outline of recommended QC steps for GWAS data: http://bioconductor.org/packages/release/bioc/vignettes/GWASTools/inst/doc/DataCleaning.pdf

The above document uses GWASTools, but SeqVarTools can be used instead.

Sample QC

Annotated vs genetic sex check

Ideally one would use X and Y intensity values (array data) or X and Y read depth (sequencing data) to infer genetic sex. If these are not available, can use X chromosome heterozygosity and Y chromosome missingness to check sex.

Missing call rate

sample_qc.R

Variant QC

Missing call rate and allele frequency

variant_qc.R

Hardy-Weinberg equilibrium

When a dataset contains samples from multiple populations with different allele frequencies, it is recommended to run HWE in each population separately.

hwe.R

Relatedness

http://bioconductor.org/packages/release/bioc/vignettes/GENESIS/inst/doc/assoc_test_seq.html#population-structure-and-relatedness

As an additional QC step, compare pairwise relatedness estimates from PC-Relate with expected values from the pedigree. Example code: https://github.com/UW-GAC/analysis_pipeline/blob/master/R/pedigree_check.R

Association testing

http://bioconductor.org/packages/release/bioc/vignettes/GENESIS/inst/doc/assoc_test_seq.html#association-tests

Dependencies

This pipeline primarily uses R internally. It was built and tested on CentOS 7 and CentOS8 machines with R version 4.0.5, with the following package versions:

Bioconductor

Biobase_2.50.0 BiocGenerics_0.36.1 GENESIS_2.20.1 gdsfmt_1.26.1 SeqArray_1.30.0 SeqVarTools_1.28.1 SNPRelate_1.24.0

(See here for Bioconductor installation instructions.)

CRAN

argparser_0.7.1 coxmeg_1.0.13 dplyr_1.0.8 fastman_0.1.0 GGally_2.1.1 ggplot2_3.3.5 magrittr_2.0.2 RColorBrewer_1.1-2 Rcpp_1.0.8.2 tidyr_1.2.0

Additionally, this pipeline also has suggested dependencies PLINK v1.90b6.14 64-bit and RUTH (November 2020 release).

Installation

After installing the dependencies, on *NIX systems, git clone this directory to your system. The scripts should be executable, so run them like:

/path/to/pipeline/script.R arg1 arg2

Or, add the directory to your $PATH environment variable and you can do instead:

script.R arg1 arg2

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
vignette		vignette
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
add_fid.R		add_fid.R
assoc_combine.R		assoc_combine.R
assoc_plots.R		assoc_plots.R
association_test.R		association_test.R
coxmeg.R		coxmeg.R
developer_guidelines.md		developer_guidelines.md
fit_null_mod.R		fit_null_mod.R
get_controls.R		get_controls.R
hwe.R		hwe.R
king_grm.R		king_grm.R
kinship_plot.R		kinship_plot.R
ld_pruning.R		ld_pruning.R
parse_ruth.R		parse_ruth.R
pca_corr.R		pca_corr.R
pca_corr_plots.R		pca_corr_plots.R
pca_plots.R		pca_plots.R
pcair.R		pcair.R
pcrelate.R		pcrelate.R
pcrelate_beta.R		pcrelate_beta.R
pcrelate_block.R		pcrelate_block.R
pcrelate_correct.R		pcrelate_correct.R
plink_to_gds.R		plink_to_gds.R
runRscript.sh		runRscript.sh
run_ruth.sh		run_ruth.sh
ruth_format_pcs.R		ruth_format_pcs.R
sample_qc.R		sample_qc.R
variant_qc.R		variant_qc.R
vcf_to_gds.R		vcf_to_gds.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blue Lab pipeline

Running with the PBS job scheduler

Convert to GDS

QC

Sample QC

Annotated vs genetic sex check

Missing call rate

Variant QC

Missing call rate and allele frequency

Hardy-Weinberg equilibrium

Relatedness

Association testing

Dependencies

Bioconductor

CRAN

Installation

About

Releases

Packages

Contributors 5

Languages

License

Blue-Lab/assoc_pipeline

Folders and files

Latest commit

History

Repository files navigation

Blue Lab pipeline

Running with the PBS job scheduler

Convert to GDS

QC

Sample QC

Annotated vs genetic sex check

Missing call rate

Variant QC

Missing call rate and allele frequency

Hardy-Weinberg equilibrium

Relatedness

Association testing

Dependencies

Bioconductor

CRAN

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages