-
Notifications
You must be signed in to change notification settings - Fork 5
/
build_script.Rout
393 lines (371 loc) · 12 KB
/
build_script.Rout
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> tictoc::tic()
> library(sqlome)
>
> library(RSQLite)
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
> library(tidyr)
> library(stringr)
> library(data.table)
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
> library(purrr)
Attaching package: ‘purrr’
The following object is masked from ‘package:data.table’:
transpose
> library(readr)
>
> library(RTCGA)
Welcome to the RTCGA (version: 1.8.0).
> library(RTCGA.rnaseq)
> library(RTCGA.miRNASeq)
> library(RTCGA.RPPA)
>
> # get the the TCGA cohorts with all three assays
> cohorts <- as.character(sqlome_info()$Cohort)
>
> # open a connection to a db
> db <- dbConnect(SQLite(), 'miRCancer.db')
>
> # microRNA gene correlaitons
> ## make cor_mir table
> ### calculate correlations
> cat('1: Calculating microRNA-gene correlations.\n')
1: Calculating microRNA-gene correlations.
>
> df <- map(cohorts[-20], function(x) {
+ # make names of RTCGA data.frames
+ mi <- paste(x, 'miRNASeq', sep = '.')
+ m <- paste(x, 'rnaseq', sep = '.')
+
+ # read data.frames
+ mi <- get(mi)
+ m <- get(m)
+
+ # tidy data.frames
+ mi <- mirna_tidy(mi)
+ m <- mrna_tidy(m)
+
+ # calcualte correlations in a tidy data.table
+ corr <- cor_make(mi, m, x, tidy = TRUE)
+ corr <- as.data.table(corr)
+
+ # print progress
+ cat(paste('microRNA-gene correlation for', x, 'is done.\n'))
+
+ # return tidy data.table
+ return(corr)
+ })
microRNA-gene correlation for ACC is done.
microRNA-gene correlation for BLCA is done.
microRNA-gene correlation for BRCA is done.
microRNA-gene correlation for CESC is done.
microRNA-gene correlation for CHOL is done.
microRNA-gene correlation for COAD is done.
microRNA-gene correlation for COADREAD is done.
microRNA-gene correlation for DLBC is done.
microRNA-gene correlation for ESCA is done.
microRNA-gene correlation for GBMLGG is done.
microRNA-gene correlation for HNSC is done.
microRNA-gene correlation for KICH is done.
microRNA-gene correlation for KIPAN is done.
microRNA-gene correlation for KIRC is done.
microRNA-gene correlation for KIRP is done.
microRNA-gene correlation for LGG is done.
microRNA-gene correlation for LIHC is done.
microRNA-gene correlation for LUAD is done.
microRNA-gene correlation for LUSC is done.
microRNA-gene correlation for OV is done.
microRNA-gene correlation for PAAD is done.
microRNA-gene correlation for PCPG is done.
microRNA-gene correlation for PRAD is done.
microRNA-gene correlation for READ is done.
microRNA-gene correlation for SARC is done.
microRNA-gene correlation for SKCM is done.
microRNA-gene correlation for STAD is done.
microRNA-gene correlation for STES is done.
microRNA-gene correlation for TGCT is done.
microRNA-gene correlation for THCA is done.
microRNA-gene correlation for THYM is done.
microRNA-gene correlation for UCEC is done.
microRNA-gene correlation for UCS is done.
microRNA-gene correlation for UVM is done.
There were 22 warnings (use warnings() to see them)
>
> ### use reduce to merge data.tables
> cat('2: Merging microRNA-gene correlations.\n')
2: Merging microRNA-gene correlations.
> df <- Reduce(function(x, y) merge(x, y, all=TRUE), df)
>
> ### write cor_mir table to connection db
> cat('3: Writing microRNA-gene correlations.\n')
3: Writing microRNA-gene correlations.
> dbWriteTable(db,
+ name = 'cor_mir',
+ df,
+ overwrite = TRUE)
> ### making index on cor_mir
> dbSendQuery(db,
+ statement = 'create index idx1 on cor_mir (mirna_base);',
+ overwrite = TRUE)
<SQLiteResult>
SQL create index idx1 on cor_mir (mirna_base);
ROWS Fetched: 0 [complete]
Changed: 1
>
> ## make targets for/microRNA-gene mapping
> cat('4: Extracting microRNA-gene targets.\n')
4: Extracting microRNA-gene targets.
>
> targets <- list()
> targets$genes <- get_targets(unique(df$mirna_base), 'gene')
Joining, by = "name"
'select()' returned many:1 mapping between keys and columns
>
> # microRNA gene correlaitons
> ## make cor_rppa table
> ### calculate correlations
> cat('5: Calculating microRNA-protein correlations.\n')
5: Calculating microRNA-protein correlations.
>
> df <- map(cohorts[-20], function(x) {
+ # make names of RTCGA data.frames
+ mi <- paste(x, 'miRNASeq', sep = '.')
+ m <- paste(x, 'RPPA', sep = '.')
+
+ # read data.frames
+ mi <- get(mi)
+ m <- get(m)
+
+ # tidy data.frames
+ mi <- mirna_tidy(mi)
+ m <- rppa_tidy(m)
+
+ # calcualte correlations in a tidy data.table
+ corr <- cor_make(mi, m, x, tidy = TRUE)
+ corr <- as.data.table(corr)
+
+ # print progress
+ cat(paste('microRNA-protein correlation for', x, 'is done.\n'))
+
+ # return tidy data.table
+ return(corr)
+ })
microRNA-protein correlation for ACC is done.
microRNA-protein correlation for BLCA is done.
microRNA-protein correlation for BRCA is done.
microRNA-protein correlation for CESC is done.
microRNA-protein correlation for CHOL is done.
microRNA-protein correlation for COAD is done.
microRNA-protein correlation for COADREAD is done.
microRNA-protein correlation for DLBC is done.
microRNA-protein correlation for ESCA is done.
microRNA-protein correlation for GBMLGG is done.
microRNA-protein correlation for HNSC is done.
microRNA-protein correlation for KICH is done.
microRNA-protein correlation for KIPAN is done.
microRNA-protein correlation for KIRC is done.
microRNA-protein correlation for KIRP is done.
microRNA-protein correlation for LGG is done.
microRNA-protein correlation for LIHC is done.
microRNA-protein correlation for LUAD is done.
microRNA-protein correlation for LUSC is done.
microRNA-protein correlation for OV is done.
microRNA-protein correlation for PAAD is done.
microRNA-protein correlation for PCPG is done.
microRNA-protein correlation for PRAD is done.
microRNA-protein correlation for READ is done.
microRNA-protein correlation for SARC is done.
microRNA-protein correlation for SKCM is done.
microRNA-protein correlation for STAD is done.
microRNA-protein correlation for STES is done.
microRNA-protein correlation for TGCT is done.
microRNA-protein correlation for THCA is done.
microRNA-protein correlation for THYM is done.
microRNA-protein correlation for UCEC is done.
microRNA-protein correlation for UCS is done.
microRNA-protein correlation for UVM is done.
There were 33 warnings (use warnings() to see them)
>
> ### use reduce to merge data.tables
> cat('6: Merging microRNA-protein correlations.\n')
6: Merging microRNA-protein correlations.
>
> df <- Reduce(function(x, y) merge(x, y, all=TRUE), df)
>
> ### write cor_rppa table to connection db
> cat('7: Writing microRNA-protein correlations.\n')
7: Writing microRNA-protein correlations.
>
> dbWriteTable(db,
+ name = 'cor_rppa',
+ df,
+ overwrite = TRUE)
Warning message:
Closing open result set, pending rows
>
> ### making index on cor_rppa
> dbSendQuery(db,
+ statement = 'create index idx2 on cor_rppa (mirna_base);',
+ overwrite = TRUE)
<SQLiteResult>
SQL create index idx2 on cor_rppa (mirna_base);
ROWS Fetched: 0 [complete]
Changed: 1
>
> ## make targets for/microRNA-gene mapping
> cat('8: Extracting microRNA-gene targets.\n')
8: Extracting microRNA-gene targets.
> targets$protein <- get_targets(unique(df$mirna_base), 'protein')
Joining, by = "name"
'select()' returned many:1 mapping between keys and columns
Parsed with column specification:
cols(
ab_id = col_character(),
source = col_character(),
cat_num = col_character(),
gene_id = col_character()
)
>
> # merge targets tables
> cat('9: Merging microRNA gene and protein targets.\n')
9: Merging microRNA gene and protein targets.
>
> targets <- bind_rows(targets, .id = 'feature_type')
>
> # write targets table
> cat('10: Wrtigin microRNA gene and protein targets.\n')
10: Wrtigin microRNA gene and protein targets.
>
> dbWriteTable(db,
+ name = 'targets',
+ targets,
+ overwrite = TRUE)
Warning message:
Closing open result set, pending rows
>
> ### making index on cor_mir
> dbSendQuery(db,
+ statement = 'create index idx3 on targets (mirna_base);',
+ overwrite = TRUE)
<SQLiteResult>
SQL create index idx3 on targets (mirna_base);
ROWS Fetched: 0 [complete]
Changed: 1
>
> # write miRNASeq profiles
> cat('11: Extracting microRNA profiles.\n')
11: Extracting microRNA profiles.
>
> df <- map(cohorts[-20], function(x) {
+ # make names of RTCGA data.frames
+ mi <- paste(x, 'miRNASeq', sep = '.')
+
+ # read data.frames
+ mi <- get(mi)
+
+ # tidy data.frames
+ mi <- mirna_tidy(mi)
+ mi <- cbind(mirna_base = rownames(mi), as.data.frame(mi))
+ mi <- gather(mi, bcr, count, -mirna_base)[, -2]
+ mi <- as.data.table(mi)
+
+ # print progress
+ cat(paste('microRNAs profiles for', x, 'is done.\n'))
+
+ # return tidy data.table
+ return(mi)
+ })
microRNAs profiles for ACC is done.
microRNAs profiles for BLCA is done.
microRNAs profiles for BRCA is done.
microRNAs profiles for CESC is done.
microRNAs profiles for CHOL is done.
microRNAs profiles for COAD is done.
microRNAs profiles for COADREAD is done.
microRNAs profiles for DLBC is done.
microRNAs profiles for ESCA is done.
microRNAs profiles for GBMLGG is done.
microRNAs profiles for HNSC is done.
microRNAs profiles for KICH is done.
microRNAs profiles for KIPAN is done.
microRNAs profiles for KIRC is done.
microRNAs profiles for KIRP is done.
microRNAs profiles for LGG is done.
microRNAs profiles for LIHC is done.
microRNAs profiles for LUAD is done.
microRNAs profiles for LUSC is done.
microRNAs profiles for OV is done.
microRNAs profiles for PAAD is done.
microRNAs profiles for PCPG is done.
microRNAs profiles for PRAD is done.
microRNAs profiles for READ is done.
microRNAs profiles for SARC is done.
microRNAs profiles for SKCM is done.
microRNAs profiles for STAD is done.
microRNAs profiles for STES is done.
microRNAs profiles for TGCT is done.
microRNAs profiles for THCA is done.
microRNAs profiles for THYM is done.
microRNAs profiles for UCEC is done.
microRNAs profiles for UCS is done.
microRNAs profiles for UVM is done.
>
> ### use reduce to merge data.tables
> cat('12: Merging microRNA profiles.\n')
12: Merging microRNA profiles.
>
> names(df) <- cohorts[-20]
> df <- bind_rows(df, .id = 'study')
There were 31 warnings (use warnings() to see them)
>
> # write profiles table
> cat('13: Writing microRNA profiles.\n')
13: Writing microRNA profiles.
>
> dbWriteTable(db,
+ name = 'profiles',
+ df,
+ overwrite = TRUE)
Warning message:
Closing open result set, pending rows
>
> ### making index on profiles
> dbSendQuery(db,
+ statement = 'create index idx4 on profiles (mirna_base);',
+ overwrite = TRUE)
<SQLiteResult>
SQL create index idx4 on profiles (mirna_base);
ROWS Fetched: 0 [complete]
Changed: 1
>
> # disconnect from the db file
> dbDisconnect(db)
Warning message:
In rsqlite_disconnect(conn@ptr) :
There are 1 result in use. The connection will be released when they are closed
> tictoc::toc()
1589.422 sec elapsed
>
> proc.time()
user system elapsed
705.959 566.385 1589.679