trackplot.R
is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files.
trackplot.R
is a standalone R script and requires no installation. Just source it and you're good to go!
source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true")
# OR
download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R")
source('trackplot.R')
# OR If you prefer to have it as package
remotes::install_github(repo = "poisonalien/trackplot")
Why trackplot
?
- It's extremely fast since most of the heavy lifting is done by bwtool. >15X faster than deeptools for equivalent
profileplots
andheatmaps
- Lightweight and minimal dependency
- data.table and bwtool are the only requirements. Similar R packages GViz and karyoploteR has over 150 dependencies.
- Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages)
- Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible.
- Supports GTF and standard UCSC gene formats as well.
- Customization: Each plot can be customized for color, scale, height, width, etc.
- Tracks can be summarized per condition (by mean, median, max, min)
- PCA and, optional differential peak analysis with
limma
when using uniformly processed, normalized bigWig files.
- data.table R package - which itself has no dependency.
- bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g;
/usr/local/bin
) or a directory under the PATH.
-
For macOS: Please download the pre-build binary from here. Make it executable with
chmod +x bwtool
. macOS gatekeeper might complain that it can't run the binary downloaded from the internet. If so, allow it in the security settings. -
For centOS or debian: Follow these compilation instructions.
If you find the script useful consider citing trackplot and bwtool
Mayakonda A, and Frank Westermann. Trackplot: a fast and lightweight R script for epigenomic enrichment plots. Bioinformatics advances vol. 4,1 vbae031. 28 Feb. 2024. PMID: 38476298
Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: 24489365
Simple usage - Make a table of all the bigWig files to be analysed with read_coldata()
and pass it to the downstream functions.
flowchart TD
a[bigWig file list] -->A{read_coldata}
A --> B{track_extract}
B --> B1[track_plot]
A --> C{profile_extract}
C --> C1[profile_summarize]
C --> C3[profile_heatmap]
C1 --> C2[profile_plot]
A --> D{extract_summary}
D --> D1[pca_plot]
D --> D2[diffpeak]
D2 --> D3[volcano_plot]
#Path to bigWig files
bigWigs = c("H1_Oct4.bw", "H1_Nanog.bw", "H1_k4me3.bw",
"H1_k4me1.bw", "H1_k27ac.bw", "H1_H2az.bw", "H1_Ctcf.bw")
#Make a table of bigWigs along with ref genome build
bigWigs = read_coldata(bws = bigWigs, build = "hg19")
track_extract()
and track_plot()
are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, track_summarize
can summarize tracks by condition.
#Region to plot
oct4_loci = "chr6:31125776-31144789"
#Extract bigWig signal for a loci of interest
t = track_extract(colData = bigWigs, loci = oct4_loci)
#Or you can also specifiy a gene name instead of a loci
# - loci and gene models will be automatically extracted from UCSC genome browser
t = track_extract(colData = bigWigs, gene = "POU5F1")
track_plot(summary_list = t)
track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60")
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE)
Use track_overlay = TRUE
to overlay all tracks into a single line track
track_plot(summary_list = t, col = track_cols, show_ideogram = FALSE, track_overlay = TRUE)
oct4_nanog_peaks = c("H1_Nanog.bed","H1_Oct4.bed") #Peak files
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE,
peaks = oct4_nanog_peaks)
chromHMM data should be a bed file with the 4th column containing chromatin state. See here for an example file.
Note that the color code for each of the 15 states are as described here.
In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument chromHMM_cols
chromHMM_peaks = "H1_chromHMM.bed"
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE,
peaks = oct4_nanog_peaks, chromHMM = chromHMM_peaks)
UCSC has 9 cell lines for which chromHMM data is available. These can be added automatically in case if you dont have your own data.
In this case, use the argument ucscChromHMM
with any values from TableName column of the below table.
TableName cell Description Tissue Karyotype
1: wgEncodeBroadHmmGm12878HMM GM12878 B-lymphocyte, lymphoblastoid blood normal
2: wgEncodeBroadHmmH1hescHMM H1-hESC embryonic stem cells embryonic stem cell normal
3: wgEncodeBroadHmmHepg2HMM HepG2 hepatocellular carcinoma liver cancer
4: wgEncodeBroadHmmHepg2HMM HMEC mammary epithelial cells breast normal
5: wgEncodeBroadHmmHsmmHMM HSMM skeletal muscle myoblasts muscle normal
6: wgEncodeBroadHmmHuvecHMM HUVEC umbilical vein endothelial cells blood vessel normal
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE,
peaks = oct4_nanog_peaks,
ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))
By default tracks are organized from top to bottom as c("p", "b", "h", "g", "c")
corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument layout_ord
. Furthermore, bigWig tracks themselves can be ordered with the argument bw_ord
which accepts the names of the bigWig tracks as input and plots them in the given order.
#Draw only NANOG, OCT4 bigWigs in that order. Re-organize the layout in the order chromHMM track, gene track, cytoband track. Rest go to the end.
track_plot(
summary_list = t,
col = track_cols,
show_ideogram = TRUE,
genename = c("POU5F1", "TCF19"),
peaks = oct4_nanog_peaks,
peaks_track_names = c("NANOG", "OCT4"),
groupAutoScale = FALSE, ucscChromHMM = "wgEncodeBroadHmmH1hescHMM", y_min = 0,
bw_ord = c("NANOG", "OCT4"),
layout_ord = c("h", "g", "c")
)
All of the above plots can also be generated with narrowPeak or broadPeak files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per UCSC convention
narrowPeak
is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files.
narrowPeaks = c("H1_Ctcf.bed", "H1_H2az.bed", "H1_k27ac.bed",
"H1_k4me1.bed", "H1_k4me3.bed", "H1_Nanog.bed",
"H1_Oct4.bed", "H1_Pol2.bed")
#Use peak as input_type
narrowPeaks = read_coldata(narrowPeaks, build = "hg19", input_type = "peak")
oct4_loci = "chr6:30,818,383-31,452,182" #633Kb region for example
narrowPeaks_track = track_extract(colData = narrowPeaks, loci = oct4_loci)
#Rest plotting is same
track_plot(summary_list = narrowPeaks_track,
show_ideogram = TRUE,
peaks = oct4_nanog_peaks,
ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))
profile_extract()
-> profile_summarize()
-> profile_plot()
are functions to generate density based profile-plots from bigWig files.
- Below example for summarizing approx. 3,671 peaks for 3 bigWig files takes ca. 3 seconds on my 5 year old macbook Pro. This includes generating signal matrix, summarizing, and plotting. Equivalent deeptools commands takes 20 seconds.
- Optionally, it can also query UCSC genome browser for refseq transcripts of desired assembly and summarize around TSS regions
- Replicates can be collapsed into single value per condition
Example data from GSE99183 where U87 glioma cell lines are treated with a DMSO and a BRD4 degradaer.
bws = c("GSM2634756_U87_BRD4.bw", "GSM2634757_U87_BRD4_dBET_24h.bw", "GSM2634758_U87_BRD4_dBET_2h.bw")
bws = read_coldata(bws = bws,
sample_names = c("BRD4", "BRD4_dBET_24h", "BRD4_dBET_2h"),
build = "hg19")
#Extract signals from bigWig files around refseq transcripts
pe_refseq = profile_extract(colData = bws, ucsc_assembly = TRUE,
startFrom = 'start', up = 1500, down = 1500)
#Estimate mean signal
ps_refseq = profile_summarize(sig_list = pe_refseq)
#Plot
profile_plot(ps_refseq)
#BRD4 binding sites
bed = "GSM2634756_U87_BRD4_peaks.narrowPeak.gz"
#Center and extend 1500 both ways from the peak center
pe_bed = profile_extract(colData = bws, bed = bed, startFrom = "center",
up = 1500, down = 1500, nthreads = 4)
#Estimate mean signal
ps_bed = profile_summarize(sig_list = pe_bed)
#Plot
profile_plot(ps_bed)
Output from profile_extract
can be used to draw a heatmap with profile_heatmap
profile_heatmap(mat_list = pe_bed, top_profile = TRUE, zmaxs = 0.8)
PSA If you find the tool useful, consider starring this repository or up voting this Biostars thread so that more poeple can find it :)
- Windows OS is not supported
Joschka Hey for all the cool suggestions :)