A R package used to identify single base resolution m6A and differential m6A methylation site from MeRIP-seq data version 1.0.
Version: 1.0.6
Date: 2021-12-13
Author: Songyao Zhang zsynwpu@gmail.com, Jia Meng Jia.Meng@xjtlu.edu.cn
Maintainer: Songyao Zhang zsynwpu@gmail.com
The package is developed for the single base resolution m6A sites identification and differential analysis for MeRIP-seq data of two experimental conditions to unveil the dynamics in post-transcriptional regulation of the RNA methylome. This is to our knowledge the first tool to identify single base m6A and differential m6A methylation (DmM) sites using a deep learning and statistic test. The statistic test employed the same rhtest as exomePeak or alternatively used QNB (QNB is currently unavailable, we will try to solve this soon) which is more strict so that sacrifice some sensitivity. Please feel free to contact Songyao Zhang zsynwpu@gmail.com if you have any questions. Many thanks to Jia Meng for the exomePeak R package (Meng, Jia, et al. "Exome-based analysis for RNA epigenome sequencing data." Bioinformatics 29.12 (2013): 1565-1567.), which provides the base for DMDeepm6A.
License: GPL-2
Depends: exomePeak, keras, DESeq, TxDb.Hsapiens.UCSC.hg19.knownGene, BSgenome.Hsapiens.UCSC.hg19, org.Hs.eg.db
Citation: Song-Yao Z , Shao-Wu Z , Xiao-Nan F , et al. FunDMDeep-m6A: identification and prioritization of functional differential m6A methylation genes [J]. Bioinformatics, 2019(14):14.
DMDeepm6A depends on exomePeak, keras, DESeq, TxDb.Hsapiens.UCSC.hg19.knownGene, BSgenome.Hsapiens.UCSC.hg19, org.Hs.eg.dbr R packages and please make sure install them before installing DMDeepm6A.
- Keras installation
Make sure Anaconda is installed for windows system for Python 3.x (https://www.anaconda.com/download/#windows) before installing Keras ("Anaconda3-5.3.0-Windows-x86_64" is suggested). Then follow the instruction on web (https://keras.rstudio.com/) to install keras (please do not install the current version, tensorflow version 1.10 and keras version 2.2.0 is suggested) or install in R as following:
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_version("reticulate", version = "1.10",repos = "https://cloud.r-project.org/")
devtools::install_version("tensorflow", version = "1.10",repos = "https://cloud.r-project.org/")
devtools::install_version("keras", version = "2.2.0",repos = "https://cloud.r-project.org/")
You may set your convenient CRAN mirrors by setting repos
.
Then create the required virtual environment and install corresponding version of keras and tensorflow (e.g., tensorflow1.10, keras2.2.0) using conda, for Windows users, you can set this in Anaconda Prompt:
conda create -n r-tensorflow tensorflow==1.10 keras==2.2.0 numpy==1.14.5
r-tensorflow
is the name of your virtual environment will be used by keras in R. For windows users, please make sure your Anaconda path is added to environment variable. Then open R to check whether the keras is available:
library(keras)
is_keras_available()
[1] TRUE
For Linux/ubuntu users, reticulate
will firstly search ~/.virtualenvs
for python, then to make sure your R use the python installed in your conda env, you may need to firstly setup the RETICULATE_PYTHON
before library and use keras
and DMDeepm6A
:
Sys.setenv(RETICULATE_PYTHON = "Path_To_conda/envs/r-tensorflow/bin/python")
library(keras)
is_keras_available()
[1] TRUE
- Other required Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("exomePeak", "DESeq", "TxDb.Hsapiens.UCSC.hg19.knownGene",
"BSgenome.Hsapiens.UCSC.hg19", "org.Hs.eg.db"))
The "exomePeak" and "DESeq" packages are not available in Bioconductor3.10 for R>=4.0. If you are using R4.0 or above, Please install "exomePeak" from https://github.com/ZW-xjtlu/exomePeak. And install the "DESeq" package by:
##for Linux installation
install.packages("https://www.bioconductor.org/packages/3.11/bioc/src/contrib/DESeq_1.39.0.tar.gz", repos = NULL, type="source")
##for Windows installation
install.packages("https://www.bioconductor.org/packages/3.11/bioc/bin/windows/contrib/4.0/DESeq_1.39.0.zip", repos = NULL, type="source")
- DMDeepm6A installation
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("NWPU-903PR/DMDeepm6A1.0")
# example for default hg19 genome
# get input bam
Sys.setenv(RETICULATE_PYTHON = "Path_To_conda/envs/r-tensorflow/bin/python") ## For Linux/ubuntu
library(DMDeepm6A)
library(keras)
ip_bam1 <- system.file("extdata", "treated_ip1.bam", package="DMDeepm6A")
ip_bam2 <- system.file("extdata", "treated_ip2.bam", package="DMDeepm6A")
ip_bam3 <- system.file("extdata", "untreated_ip1.bam", package="DMDeepm6A")
ip_bam4 <- system.file("extdata", "untreated_ip2.bam", package="DMDeepm6A")
input_bam1 <- system.file("extdata", "treated_input1.bam", package="DMDeepm6A")
input_bam2 <- system.file("extdata", "treated_input2.bam", package="DMDeepm6A")
input_bam3 <- system.file("extdata", "untreated_input1.bam", package="DMDeepm6A")
input_bam4 <- system.file("extdata", "untreated_input2.bam", package="DMDeepm6A")
ip_bams <- c(ip_bam1, ip_bam2, ip_bam3, ip_bam4)
input_bams <- c(input_bam1, input_bam2, input_bam3, input_bam4)
# get sample condition
sample_condition <- c("treated", "treated", "untreated", "untreated")
# get genome annotation, generally, you can leave this default if you use the default hg19 genome
# we use a toy gtf here to make this example run faster.
gft_genome <- system.file("extdata", "genes.gtf", package="DMDeepm6A")
# diff peak calling
re <- dmdeepm6A(ip_bams = ip_bams,
input_bams = input_bams,
sample_conditions = sample_condition,
gft_genome = gft_genome,
BSgenome = BSgenome.Hsapiens.UCSC.hg19,
egSYMBOL = org.Hs.egSYMBOL)
Please note that, if you are not using default hg19 genome, you need to input at least both the genome annotation (txdb or gft_genome) and sequence (BSgenome). Please see section 4 for more details.
# do only peak calling for several replicates
# get input bam
ip_bam1 <- system.file("extdata", "treated_ip1.bam", package="DMDeepm6A")
ip_bam2 <- system.file("extdata", "treated_ip2.bam", package="DMDeepm6A")
input_bam1 <- system.file("extdata", "treated_input1.bam", package="DMDeepm6A")
input_bam2 <- system.file("extdata", "treated_input2.bam", package="DMDeepm6A")
ip_bams <- c(ip_bam1, ip_bam2)
input_bams <- c(input_bam1, input_bam2)
# peak calling. sample_conditions should leave default (NA) or set all rep as "untreated".
re <- dmdeepm6A(ip_bams = ip_bams,
input_bams = input_bams,
gft_genome = gft_genome,
BSgenome = BSgenome.Hsapiens.UCSC.hg19,
egSYMBOL = org.Hs.egSYMBOL)
Please note that, if you are not using default hg19 genome, you need to input at least both the genome annotation (txdb or gft_genome) and sequence (BSgenome). Please see section 4 for more details.
# An genome input formate example for rat rn5 genome
# not run
# get genome annotation
library(TxDb.Rnorvegicus.UCSC.rn5.refGene)
txdb <- TxDb.Rnorvegicus.UCSC.rn5.refGene
library(BSgenome.Rnorvegicus.UCSC.rn5)
BSgenome <- BSgenome.Rnorvegicus.UCSC.rn5
BSgenome
is necessary if the genome is not hg19. If there is no public BSgenome data package for your own genome sequence, you can forge a BSgenome data package following the instruction of "BSgenome" package, or input your ".fa/.fasta" genome sequence file using readDNAStringSet function as following:
BSgenome <- readDNAStringSet("My_Genome_Sequence.fa")
names(BSgenome) <- sub(" .*", "", names(BSgenome)) ## Please make sure the names of chrome sequences are consistent with your genome annotation and input bams.
Then input the egSYMBOL
, it can be set as NA, if there is no public egSYMBOL data package for your own species.
library(org.Rn.eg.db)
egSYMBOL <- org.Rn.egSYMBOL
Finally calling differential sites with dmdeepm6A
function.
sigpeak <- dmdeepm6A(ip_bam,
input_bam,
sample_conditions = sample_conditions,
default_genome = FALSE,
txdb = txdb,
BSgenome = BSgenome,
egSYMBOL = egSYMBOL)