-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
142 lines (90 loc) · 4.07 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
fig.height=8,
fig.width=8 ,
message = FALSE
)
```
```{r , message=FALSE, echo=FALSE}
library(magrittr)
```
# corplot
<!-- badges: start -->
```{r , echo=FALSE , results='asis' , message=FALSE}
cat(
badger::badge_devel("cparsania/corplot" , color = "blue"),
badger::badge_lifecycle()
)
```
<!-- badges: end -->
## Motivation
Genomics data often stored in a matrix like format, where each row is a feature (gene, transcript, protein etc.) and columns are variables (e.g. signal intensity of experiments such as RNA-seq, ChIP-seq, Pol-II ChIP-seq etc.). Variables are often grouped by replicates, time-points or specific experimental conditions such as wild type, deletion, control, treatment etc. In such a multidimensional data, plotting a x-y scatter plot between different groups require lots of data wrangling before it goes for final ggplot.
`corplot` has functions to generate heatbox and pairwise scatter plots directly from feature matrix given in a tbl format. Let's have a look into required input data and resultant plots out of `corpot`.
## Install
```{r , eval=FALSE}
if(require("devtools")){
devtools::install_github("cparsania/corplot")
} else{
install.packages("devtools")
devtools::install_github("cparsania/corplot")
}
```
## Correlation heatbox
## All samples vs all samples
```{r, fig.width=6,fig.height=4}
expr_mat_file <- system.file("extdata" ,"example_data_expr_mat_01.txt" , package = "corplot")
expr_mat <- readr::read_delim(expr_mat_file , delim = "\t")
expr_mat
## calculate pairwise correlation
cor_tbl <- corplot::get_pairwise_cor_tbl(expr_mat , var = "gene_name" , method = "pearson")
cor_tbl
cp <- corplot::get_corr_heat_box(cor_tbl,var1 = var1, var2 = var2 ,value = corr)
cp + viridis::scale_fill_viridis() + ggplot2::theme(axis.text.x = ggplot2::element_text(angle=90))
```
## Group by replicates
All samples vs all samples correlation heatbox has redundant samples on each axis. This makes plot less readable. Alternate way to overcome this is to plot samples of replicate 1 vs samples of replicate 2.
```{r, fig.width=5, fig.height=3}
cor_tbl2 <- cor_tbl %>% dplyr::filter(grepl("Rep.A", var1) ) %>% dplyr::filter(grepl("Rep.B", var2) )
cor_tbl2
corplot::get_corr_heat_box(cor_tbl2,var1 = var1, var2 = var2, value = corr) +
viridis::scale_fill_viridis()
```
## Scatter plot
## Group by replicates : All combinations
``` {r}
groups_file <- expr_mat_file <- system.file("extdata" ,"example_data_01_sample_groups.txt" , package = "corplot")
groups <- readr::read_delim(file = groups_file,delim = "\t")
groups
csp <- corplot::get_pair_wise_scatter(dat_tbl = expr_mat, group_tbl = groups,var_plot = condition, var_plot_group = repl,dat_id = gene_name)
csp
```
### Display corr value
```{r}
cor_tbl2 <- cor_tbl %>% dplyr::rename(`Rep.A`=var1, `Rep.B` = var2) %>%
dplyr::filter(grepl("Rep.A" ,`Rep.A`)) %>%
dplyr::filter(grepl("Rep.B" ,`Rep.B`)) %>%
TidyWrappers::tbl_replace_string("_.*" , "")
cor_tbl2
csp + ggplot2::geom_text(data = cor_tbl2, x = 4, y = 18, ggplot2::aes(label = paste("r","=",corr , sep = "")) ,
fontface="italic" , col = "red",size = 5)
```
## Group by replicates : Only replicate pairs
```{r}
csp2 <- corplot::get_pair_wise_scatter(dat_tbl = expr_mat, group_tbl = groups,var_plot = condition, var_plot_group = repl,dat_id = gene_name,view_matrix = FALSE)
csp2
```
### Display corr value
``` {r, fig.width = 5, fig.height = 5}
cor_tbl3 <- cor_tbl2 %>% dplyr::filter(`Rep.A` == `Rep.B`)
csp3 <- corplot::get_pair_wise_scatter(dat_tbl = expr_mat, group_tbl = groups,var_plot = condition, var_plot_group = repl,dat_id = gene_name,view_matrix = FALSE)
csp2 + ggplot2::geom_text(data = cor_tbl3, x = 3, y = 18, ggplot2::aes(label = paste("r","=",corr , sep = "")) ,
fontface="italic" , col = "red")
```