-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
101 lines (79 loc) · 2.87 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
output: rmarkdown::github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r echo=FALSE, message=FALSE}
knitr::opts_chunk$set(message=FALSE, comment="#>")
library(magrittr)
library(ggplot2)
library(ggpubr)
```
# PPI-Context
Contextualization of protein-protein interaction databases by cell line
#### Clone repository
```
$ git clone https://github.com/montilab/ppi-context
```
#### Install requirements
```
$ cd ppi-context
$ pip install -r requirements.txt
```
#### The data
If you just want the data it's easy to load into R...
```
$ R
```
```{r}
ppi <- read.delim("data/v_1_00/PPI-Context.txt", header=TRUE, sep="\t", stringsAsFactors=FALSE)
```
```{r, fig.width=9, fig.align='center'}
data.frame(sort(table(ppi$cell_name), decreasing=TRUE)) %>%
set_colnames(c("var", "freq")) %>%
head(30) %>%
ggbarplot(x="var", y="freq", fill="freq") +
labs(title="", x="Cell Line Name", y="PPI") +
scale_fill_viridis_c(option="inferno", begin=0, end=0.8) +
theme(legend.position="none",
axis.text.x=element_text(angle=45, hjust=1, size=12, face="bold"))
```
#### Pre-processing the data
```
| PPI - Context (v1.0)
usage: ppictx.py [-h] [-r] [-d]
[-fh PATH_HIPPIE]
[-fp PATH_PUBTATOR]
[-fc PATH_CELLOSAURUS]
optional arguments:
-h, --help show this help message and exit
-r, --run run pipeline
-d, --download download raw data first
-fh PATH_HIPPIE path to downloaded Hippie data (optional)
-fp PATH_PUBTATOR path to downloaded Pubtator data (optional)
-fc PATH_CELLOSAURUS path to downloaded Cellosaurus data (optional)
```
In most cases you will need to download the latest bulk data first and then process it...
```bash
$ python ppictx.py --download --run
```
```
| PPI - Context (v1.0)
| Downloading raw data...
| Processing raw data
~ [PPI]
~ [PID -> CLA]
~ [CLA -> CID]
~ [PPI -> PID -> CLA -> CID]
```
In other cases, you might have the previous versions of the data to process...
```bash
$ python ppictx.py --run \
-fh path/to/HIPPIE.mitab \
-fp path/to/PUBTATOR.gz \
-fc path/to/CELLOSAURUS.txt
```
#### Special considerations
- Cell lines that are primarily used in research due to their efficiency as an expression vector (e.g. *HeLa, HEK, CHO, Sf9*) may not be useful representations of cell-specific protein dynamics. However it may be useful to filter out PPIs annotated with these cell lines.
- Cellosaurus contains synonymous cell lines, therefore some annotations such as *HEK (CVCL_M624)* and *HEK293 (CVCL_0045)* refer to the same cell line. Users should be aware of synonymous cell lines relevant to their interests and filter accordingly.
## Cite
Federico A, Monti S (2021) Contextualized Protein-Protein Interactions. _Patterns_. https://doi.org/10.1016/j.patter.2020.100153.