Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 39 additions & 36 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,46 +1,49 @@
Package: ramr
Title: Detection of Rare Aberrantly Methylated Regions in Array and NGS Data
Version: 1.15.0
Version: 1.15.1
Authors@R:
person(given = "Oleksii",
family = "Nikolaienko",
role = c("aut", "cre"),
email = "oleksii.nikolaienko@gmail.com",
comment = c(ORCID = "0000-0002-5910-4934"))
person(given = "Oleksii",
family = "Nikolaienko",
role = c("aut", "cre"),
email = "oleksii.nikolaienko@gmail.com",
comment = c(ORCID = "0000-0002-5910-4934"))
Comment: Maintainer: Oleksii Nikolaienko <oleksii.nikolaienko@gmail.com>
Description: ramr is an R package for detection of low-frequency aberrant
methylation events in large data sets obtained by methylation profiling
using array or high-throughput bisulfite sequencing. In addition, package
provides functions to visualize found aberrantly methylated regions (AMRs),
to generate sets of all possible regions to be used as reference sets for
enrichment analysis, and to generate biologically relevant test data sets
for performance evaluation of AMR/DMR search algorithms.
Description: ramr is an R package for detection of epimutations (i.e.,
infrequent aberrant DNA methylation events)
in large data sets obtained by methylation profiling using array or
high-throughput methylation sequencing. In addition, package
provides functions to visualize found aberrantly methylated regions (AMRs),
to generate sets of all possible regions to be used as reference sets for
enrichment analysis, and to generate biologically relevant test data sets
for performance evaluation of AMR/DMR search algorithms.
Depends:
R (>= 4.1),
GenomicRanges,
parallel,
doParallel,
foreach,
doRNG,
methods
R (>= 4.1)
Imports:
IRanges,
BiocGenerics,
ggplot2,
reshape2,
EnvStats,
ExtDist,
matrixStats,
S4Vectors
parallel,
doParallel,
foreach,
doRNG,
methods,
data.table,
matrixStats,
GenomicRanges,
IRanges,
BiocGenerics,
EnvStats,
ExtDist,
S4Vectors,
gamlss,
gamlss.dist
Suggests:
RUnit,
knitr,
rmarkdown,
gridExtra,
annotatr,
LOLA,
org.Hs.eg.db,
TxDb.Hsapiens.UCSC.hg19.knownGene
RUnit,
knitr,
rmarkdown,
ggplot2,
gridExtra,
annotatr,
LOLA,
org.Hs.eg.db,
TxDb.Hsapiens.UCSC.hg19.knownGene
License: Artistic-2.0
URL: https://github.com/BBCG/ramr
BugReports: https://github.com/BBCG/ramr/issues
Expand Down
9 changes: 6 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ export(getUniverse)
export(plotAMR)
export(simulateAMR)
export(simulateData)
import(ggplot2)
importFrom(BiocGenerics,relist)
importFrom(EnvStats,ebeta)
importFrom(ExtDist,eBeta)
Expand All @@ -15,19 +14,23 @@ importFrom(GenomicRanges,findOverlaps)
importFrom(GenomicRanges,granges)
importFrom(GenomicRanges,mcols)
importFrom(GenomicRanges,reduce)
importFrom(IRanges,`%outside%`)
importFrom(IRanges,subsetByOverlaps)
importFrom(S4Vectors,queryHits)
importFrom(data.table,as.data.table)
importFrom(data.table,melt.data.table)
importFrom(doParallel,registerDoParallel)
importFrom(doRNG,"%dorng%")
importFrom(foreach,foreach)
importFrom(gamlss,gamlss)
importFrom(gamlss,gamlss.control)
importFrom(gamlss.dist,pBEINF)
importFrom(matrixStats,rowIQRs)
importFrom(matrixStats,rowMedians)
importFrom(methods,as)
importFrom(methods,is)
importFrom(parallel,detectCores)
importFrom(parallel,makeCluster)
importFrom(parallel,stopCluster)
importFrom(reshape2,melt)
importFrom(stats,median)
importFrom(stats,na.omit)
importFrom(stats,pbeta)
Expand Down
78 changes: 49 additions & 29 deletions R/getAMR.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,16 @@
#' @details
#' In the provided data set, `getAMR` compares methylation beta values of each
#' sample with other samples to identify rare long-range methylation
#' aberrations. For `ramr.method=="IQR"`: for every genomic location (CpG) in
#' aberrations (epimutations).
#' For `ramr.method=="IQR"`: for every genomic location (CpG) in
#' `data.ranges` the IQR-normalized deviation from the median value is
#' calculated, and all CpGs with such normalized deviation not smaller than the
#' `iqr.cutoff` are retained. For `ramr.method=="*beta"`: parameters of beta
#' distribution are estimated by means of `EnvStats::ebeta` or `ExtDist::eBeta`
#' functions, and then used to calculate the probability values, followed by the
#' `iqr.cutoff` are retained. For
#' `ramr.method %in% c("beta", "wbeta", "beinf")`: parameters of beta
#' distribution are estimated by means of `EnvStats::ebeta` (beta distribution),
#' `ExtDist::eBeta` (weighted beta destribution), or `gamlss.dist::BEINF` (zero
#' and one inflated beta distribution) functions, respectively. These
#' parameters are then used to calculate the probability values, followed by the
#' filtering when all CpGs with p-values not greater than `qval.cutoff` are
#' retained. Another filtering is then performed to exclude all CpGs within
#' `exclude.range`. Next, the retained (significant) CpGs are merged within
Expand All @@ -26,8 +30,9 @@
#' columns) are included in the analysis.
#' @param ramr.method A character scalar: when ramr.method is "IQR" (the
#' default), the filtering based on interquantile range is used (`iqr.cutoff`
#' value is then used as a threshold). When "beta" or "wbeta" - filtering based
#' on fitting non-weighted (`EnvStats::ebeta`) or weighted (`ExtDist::eBeta`)
#' value is then used as a threshold). When "beta", "wbeta" or "beinf" -
#' filtering based on fitting non-weighted (`EnvStats::ebeta`), weighted
#' (`ExtDist::eBeta`) or zero-and-one inflated (`gamlss.dist::BEINF`)
#' beta distributions, respectively, is used, and `pval.cutoff` or `qval.cutoff`
#' (if not `NULL`) is used as a threshold. For "wbeta", weights directly
#' correlate with bin contents (number of values per bin) and inversly - with
Expand Down Expand Up @@ -82,20 +87,10 @@
#' data(ramr)
#' getAMR(ramr.data, ramr.samples, ramr.method="beta",
#' min.cpgs=5, merge.window=1000, qval.cutoff=1e-3, cores=2)
#' @importFrom parallel detectCores makeCluster stopCluster
#' @importFrom doParallel registerDoParallel
#' @importFrom GenomicRanges mcols reduce
#' @importFrom EnvStats ebeta
#' @importFrom ExtDist eBeta pBeta
#' @importFrom matrixStats rowMedians rowIQRs
#' @importFrom foreach foreach
#' @importFrom doRNG %dorng%
#' @importFrom methods as is
#' @importFrom stats median pbeta
#' @export
getAMR <- function (data.ranges,
data.samples=NULL,
ramr.method="IQR",
ramr.method=c("IQR", "beta", "wbeta", "beinf"),
iqr.cutoff=5,
pval.cutoff=5e-2,
qval.cutoff=NULL,
Expand All @@ -115,6 +110,7 @@ getAMR <- function (data.ranges,
stop("'data.ranges' metadata must include 'data.samples'")
if (length(data.samples)<3)
stop("at least three 'data.samples' must be provided")
ramr.method <- match.arg(ramr.method)

#####################################################################################

Expand All @@ -134,7 +130,7 @@ getAMR <- function (data.ranges,
x.median <- stats::median(x, na.rm=TRUE)
x[is.na(x)] <- x.median
# weight directly correlates with bin contents (number of values per bin)
# and inversly - with the distance from the median value, thus narrowing
# and inversely - with the distance from the median value, thus narrowing
# the estimated distribution and emphasizing outliers
c <- cut(x, c(0:100)/100)
b <- table(c)
Expand All @@ -146,6 +142,30 @@ getAMR <- function (data.ranges,
})
return(t(chunk.filt))
}
inv.logit <- function (x) exp(x)/(1+exp(x))
getPValues.beinf <- function (data.chunk, ...) {
chunk.filt <- apply(data.chunk, 1, function (x) {
x.median <- stats::median(x, na.rm=TRUE)
x[is.na(x)] <- x.median
beinf.fit <- gamlss::gamlss(
as.numeric(x)~1, sigma.formula=~1, nu.formula=~1, tau.formula=~1,
family=gamlss.dist::BEINF(mu.link="logit", sigma.link="logit",
nu.link="log", tau.link="log"),
control=gamlss::gamlss.control(trace=FALSE), ...
)
pvals <- gamlss.dist::pBEINF(
q=x, mu=inv.logit(beinf.fit$mu.coefficients),
sigma=inv.logit(beinf.fit$sigma.coefficients),
nu=exp(beinf.fit$nu.coefficients),
tau=exp(beinf.fit$tau.coefficients),
lower.tail=TRUE, log.p=FALSE
)
pvals[x>x.median] <- 1 - pvals[x>x.median]
return(pvals)
})
return(t(chunk.filt))
}


#####################################################################################

Expand All @@ -158,28 +178,30 @@ getAMR <- function (data.ranges,
universe <- getUniverse(data.ranges, merge.window=merge.window, min.cpgs=min.cpgs, min.width=min.width)
universe.cpgs <- unlist(universe$revmap)

betas <- as.matrix(mcols(data.ranges)[universe.cpgs,data.samples,drop=FALSE])
betas <- as.matrix(mcols(data.ranges)[universe.cpgs, data.samples, drop=FALSE])
if (is.null(qval.cutoff))
qval.cutoff <- pval.cutoff/ncol(betas)

chunks <- split(seq_len(nrow(betas)), if (cores>1) cut(seq_len(nrow(betas)),cores) else 1)
medians <- foreach (chunk=chunks, .combine=c) %dorng% matrixStats::rowMedians(betas[chunk,], na.rm=TRUE)
chunks <- split(seq_len(nrow(betas)), if (cores>1) cut(seq_len(nrow(betas)), cores) else 1)
medians <- foreach (chunk=chunks, .combine=c) %dorng% matrixStats::rowMedians(betas[chunk, ], na.rm=TRUE)

if (ramr.method=="IQR") {
iqrs <- foreach (chunk=chunks, .combine=c) %dorng% matrixStats::rowIQRs(betas[chunk,], na.rm=TRUE)
iqrs <- foreach (chunk=chunks, .combine=c) %dorng% matrixStats::rowIQRs(betas[chunk, ], na.rm=TRUE)
betas.filtered <- (betas-medians)/iqrs
betas.filtered[abs(betas.filtered)<iqr.cutoff] <- NA
} else if (ramr.method=="beta") {
# multi-threaded EnvStats::ebeta (speed: mme=mmue>mle>>>fitdistrplus::fitdist)
betas.filtered <- foreach (chunk=chunks) %dorng% getPValues.beta(betas[chunk,], ...)
betas.filtered <- foreach (chunk=chunks) %dorng% getPValues.beta(betas[chunk, ], ...)
betas.filtered <- do.call(rbind, betas.filtered)
betas.filtered[betas.filtered>=qval.cutoff] <- NA
} else if (ramr.method=="wbeta") {
betas.filtered <- foreach (chunk=chunks) %dorng% getPValues.wbeta(betas[chunk,], ...)
betas.filtered <- foreach (chunk=chunks) %dorng% getPValues.wbeta(betas[chunk, ], ...)
betas.filtered <- do.call(rbind, betas.filtered)
betas.filtered[betas.filtered>=qval.cutoff] <- NA
} else if (ramr.method=="beinf") {
betas.filtered <- foreach (chunk=chunks) %dorng% getPValues.beinf(betas[chunk, ], ...)
betas.filtered <- do.call(rbind, betas.filtered)
betas.filtered[betas.filtered>=qval.cutoff] <- NA
} else {
stop("unknown 'ramr.method'")
}

if (!is.null(exclude.range))
Expand All @@ -201,9 +223,7 @@ getAMR <- function (data.ranges,
ranges$xiqr <- vapply(ranges$revmap, function (revmap) {
mean(betas.filtered[not.na[revmap],column,drop=FALSE], na.rm=TRUE)
}, numeric(1))
}

if (ramr.method=="beta" | ramr.method=="wbeta") {
} else {
ranges$pval <- vapply(ranges$revmap, function (revmap) {
return( 10**mean(log10(betas.filtered[not.na[revmap],column] + .Machine$double.xmin), na.rm=TRUE) )
}, numeric(1))
Expand Down
2 changes: 0 additions & 2 deletions R/getUniverse.R
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,6 @@
#' runLOLA(amrs, universe, hg19.extdb, cores=1, redefineUserSets=TRUE)
#' }
#' }
#' @importFrom GenomicRanges reduce
#' @importFrom methods is
#' @export
getUniverse <- function (data.ranges,
merge.window=300,
Expand Down
1 change: 0 additions & 1 deletion R/globals.R

This file was deleted.

49 changes: 49 additions & 0 deletions R/internal.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#' @importFrom BiocGenerics relist
#' @importFrom data.table as.data.table melt.data.table
#' @importFrom doParallel registerDoParallel
#' @importFrom doRNG %dorng%
#' @importFrom EnvStats ebeta
#' @importFrom ExtDist eBeta pBeta
#' @importFrom foreach foreach
#' @importFrom gamlss gamlss gamlss.control
#' @importFrom gamlss.dist pBEINF
#' @importFrom GenomicRanges mcols `mcols<-` granges reduce findOverlaps
#' @importFrom IRanges subsetByOverlaps
#' @importFrom matrixStats rowMedians rowIQRs
#' @importFrom methods as is
#' @importFrom parallel detectCores makeCluster stopCluster
#' @importFrom S4Vectors queryHits
#' @importFrom stats median na.omit rbeta pbeta
#' @importFrom utils head tail
## #' @importFrom Rcpp sourceCpp
## #' @useDynLib ramr, .registration=TRUE


# internal globals, constants and helper functions
#

################################################################################
# Globals, unload
################################################################################

utils::globalVariables(c(
"chunk", "column", "ncpg", "width", "..data.samples", ":=", "alpha", "color",
"size", "start"
))

# .onUnload <- function (libpath) {library.dynam.unload("ramr", libpath)}

################################################################################
# Constants
################################################################################

# descr: ...

#

################################################################################
# Functions: ...
################################################################################

# descr: ...
# value: ...
Loading
Loading