This package provides a number of different QTL mapping utilities for multiparent populations, both for haplotype association and SNP association.
The functions in this package generally expect that the haplotype data are stored in a genome cache, the directory format output from the HAPPY HMM (Mott et al. 2000). The genome cache is a memory efficient approach to storing the genetic data because only a single locus needs to be in memory at a time, rather than a large 3D array, as is output by the HMM of DOQTL and qtl2, and used for subsequent analyses. Currently miqtl cannot use their raw inputs, but does include functions to convert them to a genome cache, and can thus be used for miqtl analyses:
- convert.DOQTL.to.HAPPY()
- convert.qtl2.to.HAPPY()
It is possible that the genome cache is unnecessarily large, with adjacent markers/loci possessing essentially identical information. miqtl includes collapse.genomecache() to create a reduced genome cache, which collapses loci through averaging adjacent markers that are identical according to a set criterion and tolerance level (based on most likely allele or Euclidean distance).
#install.packages("remotes")
remotes::install_github("gkeele/miqtl")
We use the package sparc to simulate phenotypes from a multiparent population, Collaborative Cross (CC).This also requires the sparc cache repository to ensure the simulated data is representative of the Collaborative Cross strains.
#devtools::install_github("gkeele/sparcc")
library(sparcc)
## Download the sparc genome cache from GitHub
download.file(url = "https://github.com/gkeele/sparcc_cache/archive/master.zip", destfile = "~/sparcc_cache.zip")
unzip(zipfile = "~/sparcc_cache.zip")
genomecache <- "~/sparcc_cache"
phenotypes <- sim.CC.data(genomecache = genomecache,
num.lines = 40,
num.sim = 10,
num.replicates = 2,
qtl.effect.size = 0.8)
The following code demonstrates a simple example workflow to implement miQTL. This gives a demonstration of a QTL scan in the Collaborative Cross population and compares scans which handle haplotype uncertainty using regression on probabilities (ROP) and multiple imputation.
# QTL scan using ROP
miqtl.rop.scan <- scan.h2lmm(genomecache = genomecache,
data = phenotypes$data,
pheno.id = "SUBJECT.NAME.1",
geno.id = "SUBJECT.NAME.1",
formula = sim.y.1 ~ 1,
use.multi.impute = FALSE,
return.allele.effects = TRUE)
genome.plotter.whole(scan.list=list(ROP = miqtl.rop.scan))
# QTL scan using multiple imputation
miqtl.mi11.scan <- scan.h2lmm(genomecache = genomecache,
data = phenotypes$data,
pheno.id = "SUBJECT.NAME.1",
geno.id = "SUBJECT.NAME.1",
formula=sim.y.1 ~ 1,
use.multi.impute = TRUE,
num.imp = 11)
genome.plotter.whole(scan.list = list(miqtl.mi11.scan))
miQTL generates significance thresholds for each scan using a permutation test
permuted_phenotype <- generate.sample.outcomes.matrix(scan.object = miqtl.rop.scan,
method = "permutation", num.samples = 10)
permuted_scans <- run.threshold.scans(sim.threshold.object = permuted_phenotype,
keep.full.scans=TRUE,
genomecache = genomecache,
data = phenotypes$data,
use.multi.impute = FALSE,
scan.seed = 1)
permute_threshold <- get.gev.thresholds(threshold.scans = permuted_scans,
percentile = 0.95)
genome.plotter.whole(scan.list=list(ROP = miqtl.rop.scan),
hard.thresholds = permute_threshold)
The top peak can be isolated using grab.locus.from.scan() be returning the locus with the minimum peak. The positional confidence of that QTL can be identified using parametric bootstrap analysis.
qtl_locus <- grab.locus.from.scan(miqtl.rop.scan)
## Heatmap of QTL
prob.heatmap(marker = qtl_locus,
p.value = min(miqtl.rop.scan$p.value),
genomecache = genomecache,
model = "additive",
phenotype = "sim.y.1",
phenotype.data = phenotypes$data %>% dplyr::rename(SUBJECT.NAME = SUBJECT.NAME.1))
## QTL positional confidence interval using parametric bootstraps
qtl.fit <- scan.h2lmm(genomecache = genomecache,
data = phenotypes$data,
pheno.id = "SUBJECT.NAME.1",
geno.id = "SUBJECT.NAME.1",
formula = sim.y.1 ~ 1,
use.multi.impute=FALSE,
just.these.loci = qtl_locus)
qtl.locus.bootstraps <- generate.sample.outcomes.matrix(scan.object = qtl.fit,
model.type = "alt",
method = "bootstrap",
num.sample = 100,
seed = 1,
use.REML = TRUE,
use.BLUP = TRUE)
qtl.bootstrap.scans <- run.positional.scans(sim.object = qtl.locus.bootstraps,
keep.full.scans=TRUE,
genomecache = genomecache,
data = phenotypes$data,
use.multi.impute = FALSE,
scan.seed = 1)
single.chr.plotter.w.ci(scan.object = miqtl.rop.scan,
qtl.ci.object = qtl.bootstrap.scans,
ci.type = "bootstrap",
scan.type = "ROP",
scale = "Mb",
these.col = rep("#7BAFD4", 100))
Finally, haplotype effects of the region of interest can be visualized to betteer understand founder strain contributions to alleles driving variation.
## QTL haplotype effects in region of QTL
genome.plotter.region(haplotype.association = list(miqtl.rop.scan),
chr = 14)
allele.plotter.region(scan.object = miqtl.rop.scan, chr = 14)