-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
3825 lines (3215 loc) · 244 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "AGS paper - Supplementary Information (SI)"
author: "[John Zobolas](https://github.com/bblodfon)"
date: "Last updated: `r format(Sys.time(), '%d %B, %Y')`"
description: "AGS paper - SI"
url: 'https\://druglogics.github.io/ags-paper/'
github-repo: "druglogics/ags-paper"
bibliography: ["references.bib", "packages.bib"]
link-citations: true
site: bookdown::bookdown_site
---
```{r include = FALSE}
# automatically create a bib database for R packages
knitr::write_bib(c(.packages(), 'rtemps', 'glmnet', 'MAMSE'), 'packages.bib')
```
# Intro {-}
This report is the **supplementary material** for the AGS paper and has all the simulation results and investigations related to that paper, as well as [instructions](#repro123) for reproducing the results.
## Methodology/Input Overview {-}
A list of things that change between the simulations and the presented figures are:
- The number of `Gitsbe` simulations: more simulations, more models generated.
- The type of mutation that `Gitsbe` models have:
unless otherwise specified, the `Gitsbe` models have only [link operator mutations](https://druglogics.github.io/druglogics-doc/gitsbe-config.html#genetic-algorithm).
[Topology mutations](#cascade-2.0-analysis-topology-mutations) were also tested as well as a combination of [topology and link operator mutations](#cascade-2.0-analysis-topology-and-link-operator-mutations).
- The [training data](https://druglogics.github.io/druglogics-doc/training-data.html) for the `Gitsbe` models: *steady state* (calibrated models) vs *proliferative profile* (random models).
- The type of mathematical model (HSA or Bliss) used in `Drabme` to evaluate the synergies either from the [@Flobak2015] for the CASCADE 1.0 analysis or from the [@Flobak2019] dataset for the CASCADE 2.0 analysis.
More info on the calculations that Drabme does [see here](https://druglogics.github.io/druglogics-doc/drabme-description.html#drabme-description).
- The type of output used from `Drabme`: ensemble-wise or model-wise [synergy results](https://druglogics.github.io/druglogics-doc/drabme-install.html#drabme-output).
## Summary {-}
Observing the results across the whole report, we reach the following conclusions:
:::{.green-box}
- To minimize the expected performance variance, executing $150$ `Gitsbe` simulations (~$500$ best-fitted models) is a good choice (no need for more, no matter the other input parameters).
- *Ensemble-wise* results do not correlate with *model-wise* results (see correlation results for [CASCADE 1.0](#correlation) and [CASCADE 2.0](#correlation-1)).
This happens because some drug perturbed models do not have stable states and thus cannot be evaluated for synergy. ^[Using minimal trapspaces, where there is almost always an attractor found and the global output of the model can be [calculated](https://druglogics.github.io/druglogics-doc/modeloutputs.html), we observed higher correlation between *ensemble-wise* and *model-wise* results (as expected)]
- *Model-wise* ROC results are always better compared to *ensemble-wise* ROC results for the single predictor models (e.g. the *calibrated* non-normalized model results).
- When using a combined model predictor (see [here](#auc-sensitivity)) to augment/correct the calibrated models results, Drabme's *Bliss* synergy assessment always brings significant performance benefit for the ensemble-wise results.
When using *HSA*, that is not always the case (see [one example](#auc-sensitivity-2) and [another](#auc-sensitivity-5)).
- The *model-wise* results do not bring any performance benefit when used in a combined predictor.
- The value of $\beta = -1$ is a good estimation for the value that maximizes the combined predictor's performance ($calibrated + \beta \times random$) across all of the report's relevant investigations.
- Comparing the different parameterization schemes for the CASCADE 2.0 analysis (using the combined predictors with $\beta = -1$), we observe that **topology mutations outperform link operator mutations**.
- There is **correlation** between fitness to the AGS steady state and normalized ensemble prediction performance.
This is observed for the link operator mutated CASCADE 2.0 models [here](#fit-vs-ens-perf-lo) and a little bit more for the [topology mutated ones](#fit-vs-ens-perf-topo).
Same trend was shown for the CASCADE 1.0 link-operator mutated models [analysis](#fit-vs-ens-perf-cascade1).
- Any type of scrambling in the curated CASCADE topology reduces ensemble model prediction performance.
See results for CASCADE 1.0 [here](#scrambled-topo-inv-cascade1) and CASCADE 2.0 [here](#scrambled-topo-inv-cascade2).
- Expression of `ERK` is a biomarker that distinguishes the higher performance AGS models (see results of the investigation [here](#erk-perf-inv)).
:::
# R Libraries {-}
For the ROC curves we used the function `get_roc_stats()` from the `usefun` R package [@R-usefun] and for the PR curves the `pr.curve()` from the `PRROC` package [@Grau2015].
Several functions from the `emba` R package [@Zobolas2020] are also used to load the simulation results.
The AUC sensitivity analysis (for a description see [here](#auc-sensitivity)) was inspired by work from [@Pepe2000].
The heatmaps are generated with the `ComplexHeatmap` R package [@Gu2016].
The report template is from the `rtemps` R package [@R-rtemps].
Loading libraries that are used in this report:
```{r Load libraries, message = FALSE, class.source = 'fold-show'}
library(DT)
library(ggpubr)
library(RColorBrewer)
library(xfun)
library(dplyr)
library(tidyr)
library(tibble)
library(emba)
library(usefun)
library(readr)
library(stringr)
library(latex2exp)
library(corrplot)
library(PRROC)
library(equatiomatic)
library(glmnet)
library(knitr)
library(MAMSE)
library(circlize)
library(ComplexHeatmap)
library(rstatix)
library(survival)
library(survminer)
```
# CASCADE 1.0 Analysis {-}
:::{.blue-box}
Performance of automatically parameterized models against published data in [@Flobak2015]
:::
## HSA results {-}
:::{.note}
- *HSA* refers to the synergy method used in `Drabme` to assess the synergies from the `gitsbe` models
- We test performance using ROC and PR AUC for both the *ensemble-wise* and *model-wise* synergies from `Drabme`
- **Calibrated** models: fitted to steady state ($50$ simulations)
- **Random** models: fitted to [proliferation profile](https://druglogics.github.io/druglogics-doc/training-data.html#unperturbed-condition---globaloutput-response) ($50$ simulations)
- `Gitsbe` models have mutations on **link operator** only
:::
Load results:
```{r Load CASCADE 1.0 HSA results}
# 'ss' => calibrated models, 'prolif' => proliferative random models
# 'ew' => ensemble-wise, 'mw' => model-wise
## HSA results
ss_hsa_ew_file = paste0("results/link-only/cascade_1.0_ss_50sim_fixpoints_hsa_ensemblewise_synergies.tab")
ss_hsa_mw_file = paste0("results/link-only/cascade_1.0_ss_50sim_fixpoints_hsa_modelwise_synergies.tab")
prolif_hsa_ew_file = paste0("results/link-only/cascade_1.0_rand_50sim_fixpoints_hsa_ensemblewise_synergies.tab")
prolif_hsa_mw_file = paste0("results/link-only/cascade_1.0_rand_50sim_fixpoints_hsa_modelwise_synergies.tab")
ss_hsa_ensemblewise_synergies = emba::get_synergy_scores(ss_hsa_ew_file)
ss_hsa_modelwise_synergies = emba::get_synergy_scores(ss_hsa_mw_file, file_type = "modelwise")
prolif_hsa_ensemblewise_synergies = emba::get_synergy_scores(prolif_hsa_ew_file)
prolif_hsa_modelwise_synergies = emba::get_synergy_scores(prolif_hsa_mw_file, file_type = "modelwise")
# calculate probability of synergy in the modelwise results
ss_hsa_modelwise_synergies = ss_hsa_modelwise_synergies %>%
mutate(synergy_prob_ss = synergies/(synergies + `non-synergies`))
prolif_hsa_modelwise_synergies = prolif_hsa_modelwise_synergies %>%
mutate(synergy_prob_prolif = synergies/(synergies + `non-synergies`))
observed_synergies_file = 'data/observed_synergies_cascade_1.0'
observed_synergies = emba::get_observed_synergies(observed_synergies_file)
# 1 (positive/observed synergy) or 0 (negative/not observed) for all tested drug combinations
observed = sapply(prolif_hsa_modelwise_synergies$perturbation %in% observed_synergies, as.integer)
# 'ew' => ensemble-wise, 'mw' => model-wise
pred_ew_hsa = bind_cols(ss_hsa_ensemblewise_synergies %>% rename(ss_score = score),
prolif_hsa_ensemblewise_synergies %>% select(score) %>% rename(prolif_score = score),
as_tibble_col(observed, column_name = "observed"))
pred_mw_hsa = bind_cols(
ss_hsa_modelwise_synergies %>% select(perturbation, synergy_prob_ss),
prolif_hsa_modelwise_synergies %>% select(synergy_prob_prolif),
as_tibble_col(observed, column_name = "observed"))
```
### ROC curves {-}
```{r roc-hsa-cascade1, fig.width=5, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='ROC curves (CASCADE 1.0, HSA synergy method)', fig.align='center'}
res_ss_ew = get_roc_stats(df = pred_ew_hsa, pred_col = "ss_score", label_col = "observed")
res_prolif_ew = get_roc_stats(df = pred_ew_hsa, pred_col = "prolif_score", label_col = "observed")
res_ss_mw = get_roc_stats(df = pred_mw_hsa, pred_col = "synergy_prob_ss", label_col = "observed", direction = ">")
res_prolif_mw = get_roc_stats(df = pred_mw_hsa, pred_col = "synergy_prob_prolif", label_col = "observed", direction = ">")
# Plot ROCs
my_palette = RColorBrewer::brewer.pal(n = 9, name = "Set1")
plot(x = res_ss_ew$roc_stats$FPR, y = res_ss_ew$roc_stats$TPR,
type = 'l', lwd = 3, col = my_palette[1], main = 'ROC curve, Ensemble-wise synergies (HSA)',
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
lines(x = res_prolif_ew$roc_stats$FPR, y = res_prolif_ew$roc_stats$TPR,
lwd = 3, col = my_palette[2])
legend('bottomright', title = 'AUC', col = my_palette[1:2], pch = 19,
legend = c(paste(round(res_ss_ew$AUC, digits = 2), "Calibrated"),
paste(round(res_prolif_ew$AUC, digits = 2), "Random")), cex = 1.3)
grid(lwd = 0.5)
abline(a = 0, b = 1, col = 'lightgrey', lty = 'dotdash', lwd = 1.2)
plot(x = res_ss_mw$roc_stats$FPR, y = res_ss_mw$roc_stats$TPR,
type = 'l', lwd = 3, col = my_palette[1], main = 'ROC curve, Model-wise synergies (HSA)',
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
lines(x = res_prolif_mw$roc_stats$FPR, y = res_prolif_mw$roc_stats$TPR,
lwd = 3, col = my_palette[2])
legend('bottomright', title = 'AUC', col = my_palette[1:3], pch = 19,
legend = c(paste(round(res_ss_mw$AUC, digits = 2), "Calibrated"),
paste(round(res_prolif_mw$AUC, digits = 2), "Random")), cex = 1.3)
grid(lwd = 0.5)
abline(a = 0, b = 1, col = 'lightgrey', lty = 'dotdash', lwd = 1.2)
```
### PR curves {-}
```{r pr-hsa-cascade1, fig.width=5, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='PR curves (CASCADE 1.0, HSA synergy method)', fig.align='center'}
pr_ss_ew_hsa = pr.curve(scores.class0 = pred_ew_hsa %>% pull(ss_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_hsa %>% pull(observed), curve = TRUE, rand.compute = TRUE)
pr_prolif_ew_hsa = pr.curve(scores.class0 = pred_ew_hsa %>% pull(prolif_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_hsa %>% pull(observed), curve = TRUE)
pr_ss_mw_hsa = pr.curve(scores.class0 = pred_mw_hsa %>% pull(synergy_prob_ss),
weights.class0 = pred_mw_hsa %>% pull(observed), curve = TRUE, rand.compute = TRUE)
pr_prolif_mw_hsa = pr.curve(scores.class0 = pred_mw_hsa %>% pull(synergy_prob_prolif),
weights.class0 = pred_mw_hsa %>% pull(observed), curve = TRUE)
plot(pr_ss_ew_hsa, main = 'PR curve, Ensemble-wise synergies (HSA)',
auc.main = FALSE, color = my_palette[1], rand.plot = TRUE)
plot(pr_prolif_ew_hsa, add = TRUE, color = my_palette[2])
legend('topright', title = 'AUC', col = my_palette[1:2], pch = 19,
legend = c(paste(round(pr_ss_ew_hsa$auc.davis.goadrich, digits = 2), "Calibrated"),
paste(round(pr_prolif_ew_hsa$auc.davis.goadrich, digits = 2), "Random")))
grid(lwd = 0.5)
plot(pr_ss_mw_hsa, main = 'PR curve, Model-wise synergies (HSA)',
auc.main = FALSE, color = my_palette[1], rand.plot = TRUE)
plot(pr_prolif_mw_hsa, add = TRUE, color = my_palette[2])
legend('left', title = 'AUC', col = my_palette[1:3], pch = 19,
legend = c(paste(round(pr_ss_mw_hsa$auc.davis.goadrich, digits = 2), "Calibrated"),
paste(round(pr_prolif_mw_hsa$auc.davis.goadrich, digits = 2), "Random")))
grid(lwd = 0.5)
```
:::{.green-box}
Calibrated models perform a lot better than the random ones
:::
### AUC sensitivity {-}
:::{#auc-sensitivity .blue-box}
- Investigate **combining the synergy results of calibrated and proliferative (random) models**
- Quantify the amount of information from the *proliferative* (*random*) models that can be used to augment the calibrated results?
- **Ensemble-wise** scenario: $score = calibrated + \beta \times random$
- $\beta \rightarrow +\infty$: mostly *proliferative* (random) model predictions
- $\beta \rightarrow -\infty$: mostly *reverse proliferative* (random) model predictions
- $\beta \simeq -1$: calibrated models are *normalized* against proliferative (random) model predictions.
- **Model-wise** scenario: $(1-w) \times prob_{cal} + w \times prob_{rand}, w \in[0,1]$
- $w=0$: only calibrated model predictions
- $w=1$: only proliferative (random) model predictions
:::
```{r auc-sen-ew-hsa-cascade1, dpi=300, fig.align='center', fig.height = 4, cache=TRUE, fig.cap='AUC sensitivity (CASCADE 1.0, HSA synergy method, Ensemble-wise results)'}
# Ensemble-wise
betas = seq(from = -12, to = 12, by = 0.1)
prolif_roc = sapply(betas, function(beta) {
pred_ew_hsa = pred_ew_hsa %>% mutate(combined_score = ss_score + beta * prolif_score)
res = roc.curve(scores.class0 = pred_ew_hsa %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_hsa %>% pull(observed))
auc_value = res$auc
})
prolif_pr = sapply(betas, function(beta) {
pred_ew_hsa = pred_ew_hsa %>% mutate(combined_score = ss_score + beta * prolif_score)
res = pr.curve(scores.class0 = pred_ew_hsa %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_hsa %>% pull(observed))
auc_value = res$auc.davis.goadrich
})
df_ew = as_tibble(cbind(betas, prolif_roc, prolif_pr))
df_ew = df_ew %>% tidyr::pivot_longer(-betas, names_to = "type", values_to = "AUC")
ggline(data = df_ew, x = "betas", y = "AUC", numeric.x.axis = TRUE, color = "type",
plot_type = "l", xlab = TeX("$\\beta$"), ylab = "AUC (Area Under Curve)",
legend = "none", facet.by = "type", palette = my_palette,
panel.labs = list(type = c("Precision-Recall", "ROC")),
title = TeX("AUC sensitivity to $\\beta$ parameter (HSA, CASCADE 1.0)")) +
theme(plot.title = element_text(hjust = 0.5)) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = -1, color = "black", size = 0.3, linetype = "dashed") +
geom_text(aes(x=-2, label="β = -1", y=0.25), colour="black", angle=90) +
grids()
```
```{r auc-sen-mw-hsa-cascade1, dpi=300, fig.align='center', fig.height = 4, cache=TRUE, fig.cap='AUC sensitivity (CASCADE 1.0, HSA synergy method, Model-wise results)'}
# Model-wise
weights = seq(from = 0, to = 1, by = 0.05)
prolif_roc_mw = sapply(weights, function(w) {
pred_mw_hsa = pred_mw_hsa %>%
mutate(weighted_prob = (1 - w) * pred_mw_hsa$synergy_prob_ss + w * pred_mw_hsa$synergy_prob_prolif)
res = roc.curve(scores.class0 = pred_mw_hsa %>% pull(weighted_prob),
weights.class0 = pred_mw_hsa %>% pull(observed))
auc_value = res$auc
})
prolif_pr_mw = sapply(weights, function(w) {
pred_mw_hsa = pred_mw_hsa %>%
mutate(weighted_prob = (1 - w) * pred_mw_hsa$synergy_prob_ss + w * pred_mw_hsa$synergy_prob_prolif)
res = pr.curve(scores.class0 = pred_mw_hsa %>% pull(weighted_prob),
weights.class0 = pred_mw_hsa %>% pull(observed))
auc_value = res$auc.davis.goadrich
})
df_mw = as_tibble(cbind(weights, prolif_roc_mw, prolif_pr_mw))
df_mw = df_mw %>% tidyr::pivot_longer(-weights, names_to = "type", values_to = "AUC")
ggline(data = df_mw, x = "weights", y = "AUC", numeric.x.axis = TRUE, color = "type",
plot_type = "l", xlab = TeX("weight $w$"), ylab = "AUC (Area Under Curve)",
legend = "none", facet.by = "type", palette = my_palette,
panel.labs = list(type = c("Precision-Recall", "ROC")), title.position = "center",
title = TeX("AUC sensitivity to weighted average score (HSA, CASCADE 1.0)")) +
theme(plot.title = element_text(hjust = 0.5)) +
grids()
```
:::{.green-box}
- There are $\beta$ values that can boost the predictive performance of the calibrated models (ensemble-wise) but no $w$ weight in the model-wise case.
- $\beta=-1$ seems to be a common value that maximizes both the ROC-AUC and the PR-AUC.
- The PR-AUC is **more sensitive** than the ROC-AUC, so a better indicator of performance.
:::
## Bliss results {-}
:::{.note}
- *Bliss* refers to the synergy method used in `Drabme` to assess the synergies from the `gitsbe` models
- We test performance using ROC and PR AUC for both the *ensemble-wise* and *model-wise* synergies from `Drabme`
- **Calibrated** models: fitted to steady state ($50$ simulations)
- **Random** models: fitted to [proliferation profile](https://druglogics.github.io/druglogics-doc/training-data.html#unperturbed-condition---globaloutput-response) ($50$ simulations)
- `Gitsbe` models have mutations on **link operator** only
:::
Load results:
```{r Load CASCADE 1.0 Bliss results}
# 'ss' => calibrated models, 'prolif' => random models
## Bliss results
ss_bliss_ensemblewise_file = paste0("results/link-only/cascade_1.0_ss_50sim_fixpoints_bliss_ensemblewise_synergies.tab")
ss_bliss_modelwise_file = paste0("results/link-only/cascade_1.0_ss_50sim_fixpoints_bliss_modelwise_synergies.tab")
prolif_bliss_ensemblewise_file = paste0("results/link-only/cascade_1.0_rand_50sim_fixpoints_bliss_ensemblewise_synergies.tab")
prolif_bliss_modelwise_file = paste0("results/link-only/cascade_1.0_rand_50sim_fixpoints_bliss_modelwise_synergies.tab")
ss_bliss_ensemblewise_synergies = emba::get_synergy_scores(ss_bliss_ensemblewise_file)
ss_bliss_modelwise_synergies = emba::get_synergy_scores(ss_bliss_modelwise_file, file_type = "modelwise")
prolif_bliss_ensemblewise_synergies = emba::get_synergy_scores(prolif_bliss_ensemblewise_file)
prolif_bliss_modelwise_synergies = emba::get_synergy_scores(prolif_bliss_modelwise_file, file_type = "modelwise")
# calculate probability of synergy in the modelwise results
ss_bliss_modelwise_synergies = ss_bliss_modelwise_synergies %>%
mutate(synergy_prob_ss = synergies/(synergies + `non-synergies`))
prolif_bliss_modelwise_synergies = prolif_bliss_modelwise_synergies %>%
mutate(synergy_prob_prolif = synergies/(synergies + `non-synergies`))
# 'ew' => ensemble-wise, 'mw' => model-wise
pred_ew_bliss = bind_cols(ss_bliss_ensemblewise_synergies %>% rename(ss_score = score),
prolif_bliss_ensemblewise_synergies %>% select(score) %>% rename(prolif_score = score),
as_tibble_col(observed, column_name = "observed"))
pred_mw_bliss = bind_cols(
ss_bliss_modelwise_synergies %>% select(perturbation, synergy_prob_ss),
prolif_bliss_modelwise_synergies %>% select(synergy_prob_prolif),
as_tibble_col(observed, column_name = "observed"))
```
### ROC curves {-}
```{r roc-bliss-cascade1, fig.width=5, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='ROC curves (CASCADE 1.0, Bliss synergy method)', fig.align='center'}
res_ss_ew = get_roc_stats(df = pred_ew_bliss, pred_col = "ss_score", label_col = "observed")
res_prolif_ew = get_roc_stats(df = pred_ew_bliss, pred_col = "prolif_score", label_col = "observed")
res_ss_mw = get_roc_stats(df = pred_mw_bliss, pred_col = "synergy_prob_ss", label_col = "observed", direction = ">")
res_prolif_mw = get_roc_stats(df = pred_mw_bliss, pred_col = "synergy_prob_prolif", label_col = "observed", direction = ">")
# Plot ROCs
plot(x = res_ss_ew$roc_stats$FPR, y = res_ss_ew$roc_stats$TPR,
type = 'l', lwd = 3, col = my_palette[1], main = 'ROC curve, Ensemble-wise synergies (Bliss)',
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
lines(x = res_prolif_ew$roc_stats$FPR, y = res_prolif_ew$roc_stats$TPR,
lwd = 3, col = my_palette[2])
legend('bottomright', title = 'AUC', col = my_palette[1:2], pch = 19,
legend = c(paste(round(res_ss_ew$AUC, digits = 2), "Calibrated"),
paste(round(res_prolif_ew$AUC, digits = 2), "Random")), cex = 1.3)
grid(lwd = 0.5)
abline(a = 0, b = 1, col = 'lightgrey', lty = 'dotdash', lwd = 1.2)
plot(x = res_ss_mw$roc_stats$FPR, y = res_ss_mw$roc_stats$TPR,
type = 'l', lwd = 3, col = my_palette[1], main = 'ROC curve, Model-wise synergies (Bliss)',
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
lines(x = res_prolif_mw$roc_stats$FPR, y = res_prolif_mw$roc_stats$TPR,
lwd = 3, col = my_palette[2])
legend('bottomright', title = 'AUC', col = my_palette[1:2], pch = 19,
legend = c(paste(round(res_ss_mw$AUC, digits = 2), "Calibrated"),
paste(round(res_prolif_mw$AUC, digits = 2), "Random")), cex = 1.3)
grid(lwd = 0.5)
abline(a = 0, b = 1, col = 'lightgrey', lty = 'dotdash', lwd = 1.2)
```
:::{#calibrated-bliss-dt}
The **ROC statistics data** for the calibrated models are as follows:
:::
```{r calibrated-bliss-dt, fig.align='center', fig.cap="ROC data for Calibrated Models (CASCADE 1.0, Bliss synergy method)"}
DT::datatable(data = res_ss_ew$roc_stats, options =
list(pageLength = 5, lengthMenu = c(5, 10, 16), searching = FALSE)) %>%
formatRound(c(1,6,7,8,9), digits = 5)
# investigate the average threshold as a synergy classification index
# thres = res_ss_ew$roc_stats %>% pull(threshold)
# thres = thres[is.finite(thres)] # remove Inf's
# res_ss_ew$roc_stats %>%
# filter(threshold < mean(thres)) %>%
# slice(n()) %>% kable()
```
### PR curves {-}
```{r pr-bliss-cascade1, fig.width=5, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='PR curves (CASCADE 1.0, Bliss synergy method)', fig.align='center'}
pr_ss_ew_bliss = pr.curve(scores.class0 = pred_ew_bliss %>% pull(ss_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed), curve = TRUE, rand.compute = TRUE)
pr_prolif_ew_bliss = pr.curve(scores.class0 = pred_ew_bliss %>% pull(prolif_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed), curve = TRUE)
pr_ss_mw_bliss = pr.curve(scores.class0 = pred_mw_bliss %>% pull(synergy_prob_ss),
weights.class0 = pred_mw_bliss %>% pull(observed), curve = TRUE, rand.compute = TRUE)
pr_prolif_mw_bliss = pr.curve(scores.class0 = pred_mw_bliss %>% pull(synergy_prob_prolif),
weights.class0 = pred_mw_bliss %>% pull(observed), curve = TRUE)
plot(pr_ss_ew_bliss, main = 'PR curve, Ensemble-wise synergies (Bliss)',
auc.main = FALSE, color = my_palette[1], rand.plot = TRUE)
plot(pr_prolif_ew_bliss, add = TRUE, color = my_palette[2])
legend(x = 0, y = 0.9, title = 'AUC', col = my_palette[1:2], pch = 19, cex = 1.3,
legend = c(paste(round(pr_ss_ew_bliss$auc.davis.goadrich, digits = 2), "Calibrated"),
paste(round(pr_prolif_ew_bliss$auc.davis.goadrich, digits = 2), "Random")))
grid(lwd = 0.5)
plot(pr_ss_mw_bliss, main = 'PR curve, Model-wise synergies (Bliss)',
auc.main = FALSE, color = my_palette[1], rand.plot = TRUE)
plot(pr_prolif_mw_bliss, add = TRUE, color = my_palette[2])
legend(x = 0, y = 0.9, title = 'AUC', col = my_palette[1:3], pch = 19, cex = 1.3,
legend = c(paste(round(pr_ss_mw_bliss$auc.davis.goadrich, digits = 2), "Calibrated"),
paste(round(pr_prolif_mw_bliss$auc.davis.goadrich, digits = 2), "Random")))
grid(lwd = 0.5)
```
:::{.green-box}
Calibrated models perform a lot better than the random ones
:::
### AUC sensitivity {-}
Investigate same thing as described in [here](#auc-sensitivity).
```{r auc-sen-ew-bliss-cascade1, dpi=300, fig.align='center', fig.height=4, cache=TRUE, fig.cap='AUC sensitivity (CASCADE 1.0, Bliss synergy method, Ensemble-wise results)'}
# Ensemble-wise
betas = seq(from = -12, to = 12, by = 0.1)
prolif_roc = sapply(betas, function(beta) {
pred_ew_bliss = pred_ew_bliss %>% mutate(combined_score = ss_score + beta * prolif_score)
res = roc.curve(scores.class0 = pred_ew_bliss %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed))
auc_value = res$auc
})
prolif_pr = sapply(betas, function(beta) {
pred_ew_bliss = pred_ew_bliss %>% mutate(combined_score = ss_score + beta * prolif_score)
res = pr.curve(scores.class0 = pred_ew_bliss %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed))
auc_value = res$auc.davis.goadrich
})
df_ew = as_tibble(cbind(betas, prolif_roc, prolif_pr))
df_ew = df_ew %>% tidyr::pivot_longer(-betas, names_to = "type", values_to = "AUC")
ggline(data = df_ew, x = "betas", y = "AUC", numeric.x.axis = TRUE, color = "type",
plot_type = "l", xlab = TeX("$\\beta$"), ylab = "AUC (Area Under Curve)",
legend = "none", facet.by = "type", palette = my_palette,
panel.labs = list(type = c("Precision-Recall", "ROC")),
title = TeX("AUC sensitivity to $\\beta$ parameter (Bliss, CASCADE 1.0)")) +
theme(plot.title = element_text(hjust = 0.5)) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = -1, color = "black", size = 0.3, linetype = "dashed") +
geom_text(aes(x=-2, label="β = -1", y=0.25), colour="black", angle=90) +
grids()
```
```{r auc-sen-mw-bliss-cascade1, dpi=300, fig.align='center', fig.height=4, cache=TRUE, fig.cap='AUC sensitivity (CASCADE 1.0, Bliss synergy method, Model-wise results)'}
# Model-wise
weights = seq(from = 0, to = 1, by = 0.05)
prolif_roc_mw = sapply(weights, function(w) {
pred_mw_bliss = pred_mw_bliss %>%
mutate(weighted_prob = (1 - w) * pred_mw_bliss$synergy_prob_ss + w * pred_mw_bliss$synergy_prob_prolif)
res = roc.curve(scores.class0 = pred_mw_bliss %>% pull(weighted_prob),
weights.class0 = pred_mw_bliss %>% pull(observed))
auc_value = res$auc
})
prolif_pr_mw = sapply(weights, function(w) {
pred_mw_bliss = pred_mw_bliss %>%
mutate(weighted_prob = (1 - w) * pred_mw_bliss$synergy_prob_ss + w * pred_mw_bliss$synergy_prob_prolif)
res = pr.curve(scores.class0 = pred_mw_bliss %>% pull(weighted_prob),
weights.class0 = pred_mw_bliss %>% pull(observed))
auc_value = res$auc.davis.goadrich
})
df_mw = as_tibble(cbind(weights, prolif_roc_mw, prolif_pr_mw))
df_mw = df_mw %>% tidyr::pivot_longer(-weights, names_to = "type", values_to = "AUC")
ggline(data = df_mw, x = "weights", y = "AUC", numeric.x.axis = TRUE, color = "type",
plot_type = "l", xlab = TeX("weight $w$"), ylab = "AUC (Area Under Curve)",
legend = "none", facet.by = "type", palette = my_palette,
panel.labs = list(type = c("Precision-Recall", "ROC")), title.position = "center",
title = TeX("AUC sensitivity to weighted average score (Bliss, CASCADE 1.0)")) +
theme(plot.title = element_text(hjust = 0.5)) +
grids()
```
:::{.green-box}
- There are $\beta$ values that can boost the predictive performance of the calibrated models (ensemble-wise) but no $w$ weight in the model-wise case.
- The PR-AUC is **more sensitive** than the ROC-AUC, so a better indicator of performance.
- A value very close to $\beta=-1$ seems to be the one maximizes both the ROC-AUC and the PR-AUC.
:::
## Best ROC and PRC {-}
:::{#combined-pred-bliss-dt}
The **ROC ensemble-wise statistics data** for the combined predictor $calibrated + \beta \times random, \beta=-1$ (using the results from the $50$ simulation runs) are:
:::
```{r combined-pred-bliss-dt, fig.align='center', fig.cap="ROC data for Combined Predictor (CASCADE 1.0, Bliss synergy method)"}
beta = -1
pred_ew_bliss = pred_ew_bliss %>% mutate(combined_score = ss_score + beta * prolif_score)
res_comb_pred = usefun::get_roc_stats(df = pred_ew_bliss, pred_col = "combined_score", label_col = "observed")
DT::datatable(data = res_comb_pred$roc_stats, options =
list(pageLength = 5, lengthMenu = c(5, 10, 16), searching = FALSE)) %>%
DT::formatRound(c(1,6,7,8,9), digits = 5)
# All observed synergies are in top 6
# pred_ew_bliss %>% arrange(combined_score)
```
:::{.note}
Only for the next 2 Figures, **Calibrated** stands for the combined predictor results, i.e. $calibrated + \beta \times random, \beta=-1$.
:::
```{r best-roc-pr-cascade1, fig.width=5, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='ROC and PR curves for Random and Best Combined Predictor (CASCADE 1.0, Bliss synergy method)', fig.align='center'}
plot(x = res_comb_pred$roc_stats$FPR, y = res_comb_pred$roc_stats$TPR,
type = 'l', lwd = 3, col = my_palette[1], main = 'ROC curve, Ensemble-wise synergies (Bliss)',
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
lines(x = res_prolif_ew$roc_stats$FPR, y = res_prolif_ew$roc_stats$TPR,
lwd = 3, col = my_palette[2])
legend('bottomright', title = 'AUC', col = my_palette[1:2], pch = 19,
legend = c(paste(round(res_comb_pred$AUC, digits = 2), "Calibrated"),
paste(round(res_prolif_ew$AUC, digits = 2), "Random")), cex = 1.3)
grid(lwd = 0.5)
abline(a = 0, b = 1, col = 'lightgrey', lty = 'dotdash', lwd = 1.2)
res_comb_pred_pr = pr.curve(scores.class0 = pred_ew_bliss %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed), curve = TRUE, rand.compute = TRUE)
plot(res_comb_pred_pr, main = 'PR curve, Ensemble-wise synergies (Bliss)',
auc.main = FALSE, color = my_palette[1], rand.plot = TRUE)
plot(pr_prolif_ew_bliss, add = TRUE, color = my_palette[2])
legend(x = 0, y = 0.9, title = 'AUC', col = my_palette[1:2], pch = 19, cex = 1.3,
legend = c(paste(round(res_comb_pred_pr$auc.davis.goadrich, digits = 2), "Calibrated"),
paste(round(pr_prolif_ew_bliss$auc.davis.goadrich, digits = 2), "Random")))
grid(lwd = 0.5)
```
If we add the predictions of the non-normalized calibrated data to the above Figures, we have:
```{r best-roc-pr-cascade1-2, fig.width=5, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='ROC and PR curves for Random, Calibrated and Best Combined Predictor (CASCADE 1.0, Bliss synergy method)', fig.align='center'}
plot(x = res_comb_pred$roc_stats$FPR, y = res_comb_pred$roc_stats$TPR,
type = 'l', lwd = 3, col = my_palette[1], main = 'ROC curve, Ensemble-wise synergies (Bliss)',
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
lines(x = res_prolif_ew$roc_stats$FPR, y = res_prolif_ew$roc_stats$TPR,
lwd = 3, col = my_palette[2])
lines(x = res_ss_ew$roc_stats$FPR, y = res_ss_ew$roc_stats$TPR,
lwd = 3, col = my_palette[3])
legend('bottomright', title = 'AUC', col = my_palette[c(3,1,2)], pch = 19,
legend = c(paste(round(res_ss_ew$AUC, digits = 2), "Calibrated (non-normalized)"),
paste(round(res_comb_pred$AUC, digits = 2), "Calibrated"),
paste(round(res_prolif_ew$AUC, digits = 2), "Random")), cex = 1)
grid(lwd = 0.5)
abline(a = 0, b = 1, col = 'lightgrey', lty = 'dotdash', lwd = 1.2)
res_comb_pred_pr = pr.curve(scores.class0 = pred_ew_bliss %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed), curve = TRUE, rand.compute = TRUE)
plot(res_comb_pred_pr, main = 'PR curve, Ensemble-wise synergies (Bliss)',
auc.main = FALSE, color = my_palette[1], rand.plot = TRUE)
plot(pr_prolif_ew_bliss, add = TRUE, color = my_palette[2])
plot(pr_ss_ew_bliss, add = TRUE, color = my_palette[3])
legend(x = 0, y = 0.9, title = 'AUC', col = my_palette[c(3,1,2)], pch = 19, cex = 0.9,
legend = c(paste(round(pr_ss_ew_bliss$auc.davis.goadrich, digits = 2), "Calibrated (non-normalized)"),
paste(round(res_comb_pred_pr$auc.davis.goadrich, digits = 2), "Calibrated"),
paste(round(pr_prolif_ew_bliss$auc.davis.goadrich, digits = 2), "Random")))
grid(lwd = 0.5)
```
## Correlation {-}
We test for correlation between all the synergy predictor results shown in the previous curves.
This means *ensemble-wise* vs *model-wise*, *random proliferative* models vs *calibrated* models and *HSA* vs *Bliss* synergy assessment.
*P-values* are represented at 3 significant levels: $0.05, 0.01, 0.001$ (\*, \*\*, \*\*\*) and the correlation coefficient is calculated using Kendall's *tau* statistic.
```{r cor-plot-cascade1, dpi=300, warning=FALSE, cache=TRUE, fig.align='center', fig.cap='Correlation Plot for CASCADE 1.0 Results'}
synergy_scores = bind_cols(
pred_ew_hsa %>% select(ss_score, prolif_score) %>% rename(cal_ew_hsa = ss_score, random_ew_hsa = prolif_score),
pred_ew_bliss %>% select(ss_score, prolif_score) %>% rename(cal_ew_bliss = ss_score, random_ew_bliss = prolif_score),
pred_mw_hsa %>% select(synergy_prob_ss, synergy_prob_prolif) %>%
rename(cal_mw_hsa = synergy_prob_ss, random_mw_hsa = synergy_prob_prolif),
pred_mw_bliss %>% select(synergy_prob_ss, synergy_prob_prolif) %>%
rename(cal_mw_bliss = synergy_prob_ss, random_mw_bliss = synergy_prob_prolif)
)
M = cor(synergy_scores, method = "kendall")
res = cor.mtest(synergy_scores, method = "kendall")
corrplot(corr = M, type = "upper", p.mat = res$p, sig.level = c(.001, .01, .05),
pch.cex = 1, pch.col = "white", insig = "label_sig", tl.col = "black", tl.srt = 45)
```
:::{.green-box}
- **Model-wise** don't correlate a lot with **ensemble-wise** results (*topright* part of the correlation plot).
- **HSA and Bliss results correlate**, higher for the model-wise (*bottomright*) than the ensemble-wise results (*topleft*).
- **Calibrated** results also show some correlation with the **random** results
:::
## Fitness Evolution {-}
We did a test run of `Gitsbe` with $1000$ simulations, fitting to steady state (generating thus **calibrated models**).
The only difference between the following results and the ones above is the total number of simulations specified in the configuration and that the option `bootstrap_mutations_factor` was set to $1$ (to avoid reaching good fitness models in the earlier generations).
Firstly, we show the fitness evolution of the first $20$ simulations.
Each data point is the average fitness in that generation out of $20$ models.
Note that some simulations end because the target fitness is reached by some of the models ($0.99$).
```{r fit-evolution, dpi=300, fig.align='center', cache=TRUE, fig.cap='Fitness Evolution (20 simulations, CASCADE 1.0)'}
read_summary_file = function(file_name) {
lines = readr::read_lines(file = file_name, skip = 5, skip_empty_rows = TRUE)
data_list = list()
index = 1
gen_fit_list = list()
gen_index = 1
for (line_index in 1:length(lines)) {
line = lines[line_index]
if (stringr::str_detect(string = line, pattern = "Simulation")) {
data_list[[index]] = dplyr::bind_cols(gen_fit_list)
index = index + 1
gen_fit_list = list()
gen_index = 1
} else { # read fitness values
gen_fit_list[[gen_index]] = tibble::as_tibble_col(as.numeric(unlist(strsplit(line, split = '\t'))), column_name = paste0(gen_index))
gen_index = gen_index + 1
}
}
# add the last simulation's values
data_list[[index]] = dplyr::bind_cols(gen_fit_list)
return(data_list)
}
fitness_summary_file = "results/link-only/cascade_1.0_ss_1000sim_fixpoints_hsa_summary.txt"
# `fit_res` is a list of tibbles
# Each tibble has the fitness results of a simulation
# Rows represent the models and columns are the generations
fit_res = read_summary_file(file_name = fitness_summary_file)
first_sim_data = colMeans(fit_res[[1]])
plot(1:length(first_sim_data), y = first_sim_data, ylim = c(0,1),
xlim = c(0,20), type = 'l', lwd = 1.5,
main = 'Fitness Evolution across Generations', xlab = 'Generations',
ylab = 'Average Fitness', col = usefun:::colors.100[1])
index = 2
for (fit_data in fit_res) {
if (index > 20) break
#if (ncol(fit_data) != 20) next
mean_fit_per_gen = colMeans(fit_data)
lines(x = 1:length(mean_fit_per_gen), y = mean_fit_per_gen, lwd = 1.5,
col = usefun:::colors.100[index])
index = index + 1
}
grid(lwd = 0.5)
```
Next, we plot the **average fitness + standard deviation** per generation across all $1000$ simulations:
```{r fit-evolution-2, dpi=300, fig.align='center', warning=FALSE, cache=TRUE, fig.cap='Fitness Evolution (1000 simulations, CASCADE 1.0)'}
# `avg_fit` is a tibble with rows the number of simulations and
# columns the generations. Each value in a (sim,gen) cell is the average
# fitness of models in that particular (sim,gen) combination
avg_fit = do.call(dplyr::bind_rows, sapply(fit_res, colMeans))
avg_fit_long = avg_fit %>% pivot_longer(cols = everything()) %>% mutate(name = as.integer(name))
ggline(data = avg_fit_long, x = "name", y = "value", color = my_palette[2],
add = "mean_sd", add.params = list(color = "black"), ylim = c(0, 1),
main = "Fitness Evolution across Generations",
xlab = "Generations", ylab = "Fitness") +
theme(plot.title = element_text(hjust = 0.5)) + grids()
# DIY way:
# df = avg_fit_long %>% group_by(name) %>%
# summarise(median = median(value, na.rm = TRUE),
# mean = mean(value, na.rm = TRUE),
# sd = sd(value, na.rm = TRUE))
#
# ggplot(data = df, aes(x=name, y=mean)) +
# ggtitle("Fitness Evolution across Generations") +
# xlab("Generations") + ylab("Fitness") +
# geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width=.2) +
# geom_line(color='steelblue') +
# geom_point(color='steelblue') +
# theme_pubr() + theme(plot.title = element_text(hjust = 0.5)) +
# grids()
```
:::{.green-box}
- The average fitness stabilizes after $\approx 10-15$ generations but also the standard deviation: new models are still being created through the crossover genetic algorithm phase to explore various model parameterization while keeping the fitness score relatively high.
- The *S*-shaped (sigmoid) curve is in agreement with Holland's schema theorem [@holland1992adaptation].
:::
## Fitness vs Ensemble Performance {-#fit-vs-ens-perf-cascade1}
:::{.blue-box}
We check for correlation between the **calibrated models fitness to the AGS steady state** and their **ensemble performance** subject to normalization to the random model predictions.
The **main idea** here is that we generate different training data samples, in which the boolean steady state nodes have their values flipped (so they are only partially correct) and we fit models to these ($50$ simulations => $150$ models per training data, $205$ training data samples in total).
These calibrated model ensembles can then be tested for their prediction performance.
Then we use the ensemble-wise *random proliferative* model predictions ($50$ simulations) to normalize ($\beta=-1$) against the calibrated model predictions and compute the **AUC ROC and AUC PR for each model ensemble**.
:::
:::{.note}
Check how to generate the appropriate data, run the simulations and tidy up the results in the section [Fitness vs Performance Methods].
:::
Load the already-stored result:
```{r load-data-fit-vs-perf-cascade1}
res = readRDS(file = "data/res_fit_aucs_cascade1.rds")
```
We check if our data is normally distributed using the *Shapiro-Wilk* normality test:
```{r normality-check-cascade1, comment=""}
shapiro.test(x = res$roc_auc)
shapiro.test(x = res$pr_auc)
shapiro.test(x = res$avg_fit)
```
We observe from the low *p-values* that the **data is not normally distributed**.
Thus, we are going to use a non-parametric correlation metric, namely the **Kendall rank-based** test (and it's respective coefficient, $\tau$), to check for correlation between the ensemble model performance (ROC-AUC, PR-AUC) and the fitness to the AGS steady state:
```{r fit-vs-perf-roc-cascade1, message=FALSE, cache=TRUE, dpi=300, fig.align='center', fig.cap='Fitness to AGS Steady State vs ROC-AUC Performance (CASCADE 1.0, Bliss synergy method, Ensemble-wise normalized results)'}
p = ggpubr::ggscatter(data = res, x = "avg_fit", y = "roc_auc", color = "per_flipped_data",
xlab = "Average Fitness per Model Ensemble",
title = "Fitness to AGS Steady State vs Performance (ROC)",
ylab = "ROC AUC", add = "reg.line", conf.int = TRUE,
add.params = list(color = "blue", fill = "lightgray"),
cor.coef = TRUE, cor.coeff.args = list(method = "kendall", label.y.npc = "top", size = 6, cor.coef.name = "tau")) +
theme(plot.title = element_text(hjust = 0.5)) +
scale_color_distiller(n.breaks = 5, labels = scales::label_percent(accuracy = 1L), palette = 'RdYlBu', guide = guide_colourbar(title = '%Data Flipped'))
ggpubr::ggpar(p, legend = "right", font.legend = 14)
```
```{r fit-vs-perf-pr-cascade1, message=FALSE, cache=TRUE, dpi=300, fig.align='center', fig.cap='Fitness to AGS Steady State vs PR-AUC Performance (CASCADE 1.0, Bliss synergy method, Ensemble-wise normalized results)'}
p = ggpubr::ggscatter(data = res, x = "avg_fit", y = "pr_auc", color = "per_flipped_data",
xlab = "Average Fitness per Model Ensemble",
title = "Fitness to AGS Steady State vs Performance (Precision-Recall)",
add.params = list(color = "blue", fill = "lightgray"),
ylab = "PR AUC", add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.coeff.args = list(method = "kendall", size = 6, cor.coef.name = "tau")) +
theme(plot.title = element_text(hjust = 0.5)) +
scale_color_distiller(n.breaks = 5, labels = scales::label_percent(accuracy = 1L), palette = 'RdYlBu', guide = guide_colourbar(title = '%Data Flipped'))
ggpubr::ggpar(p, legend = "right", font.legend = 14)
```
:::{.green-box}
- We observe that there exists **some correlation between the normalized ensemble model performance vs the models fitness to the training steady state data**.
- The performance as measured by the ROC AUC is less sensitive to changes in the training data but there is better correlation with regards to the PR AUC, which is a more informative measure for our imbalanced dataset [@Saito2015].
:::
## Scrambled Topologies Investigation {-#scrambled-topo-inv-cascade1}
:::{.note}
We create several *scrambled* topologies from the CASCADE 1.0 one, in order to assess the tolerance of the curated network topology to random edge changes with regards to model ensemble performance.
We introduce $4$ **types of topology scrambling** that are performed in a **varying number of edges**.
The **more edges are changed**, the **more scrambled/randomized** is the resulting topology.
The $4$ types of scrambling are:
- Randomly permutating the source nodes of the edges (**source**)
- Randomly permutating the target nodes of the edges (**target**)
- Randomly changing the interaction effect from inhibition to activation and vice-versa (**Sign Inversion**)
- Combining all the above (**all**)
Note that each type of scrambling produces a topology with the same input and output degree distribution as the original one and as such, the scale-free property of the CASCADE 1.0 topology remains unchanged in the scrambled topologies.
For each different type of scrambling, we make $10$ random topologies for each expected similarity score between the randomized and the curated topology, ranging from $0$ similarity to $0.98$ with a total of $22$ *steps*, thus $10\times22=220$ random topologies per different type of scrambling.
See more details on how to generate these topologies in the script [gen_scrambled_topologies_cascade1.R](https://github.com/druglogics/ags-paper/blob/main/scripts/gen_scrambled_topologies_cascade1.R).
To get the drug combination predictions for each scrambled topology, we executed the `druglogics-synergy` module with the default configuration ($50$ simulations per topology, for both *calibrated* to steady state and *random* proliferative models, using the *Bliss* synergy assessment method in `Drabme`) - see more info on the [run_druglogics_synergy_scrambled_topo_cascade1.sh](https://github.com/druglogics/ags-paper/blob/main/scripts/run_druglogics_synergy_scrambled_topo_cascade1.sh) script.
We calculate the normalized predictor performance ($calibrated - random$) for each topology-specific simulation and tidy up the result data in [get_syn_res_scrambled_topo_cascade1.R](https://github.com/druglogics/ags-paper/blob/main/scripts/get_syn_res_scrambled_topo_cascade1.R).
:::
Next, we load the data results and add the ROC and PR AUC results of the combined predictor (termed **Calibrated**) for the curated CASCADE 1.0 topology (see [above](#best-roc-and-prc)).
Note that the topology scrambling type is set to **none** for the results that used the original/curated CASCADE 1.0 topology.
```{r add-unscrambled-norm-perf-results}
scrambled_topo_res = readRDS(file = 'data/scrambled_topo_res_cascade1.rds')
# the un-scrambled topology results have a similarity score equal to 1, 'none'
# scrambling whatsoever as `scramble_type`, and the ROC and PR AUC values have been previously
# calculated and shown in the figures above but we re-do them here anyway :)
res_comb_roc = PRROC::roc.curve(scores.class0 = pred_ew_bliss %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed))
res_comb_pr = PRROC::pr.curve(scores.class0 = pred_ew_bliss %>% pull(combined_score) %>% (function(x) {-x}),
weights.class0 = pred_ew_bliss %>% pull(observed))
scrambled_topo_res = dplyr::bind_rows(scrambled_topo_res,
tibble::tibble(sim = 1, scramble_type = 'none', roc_auc = res_comb_roc$auc,
pr_auc = res_comb_pr$auc.davis.goadrich))
```
Interestingly, there were some scrambled topologies which didn't produce not even $1$ boolean model with a stable state when using the genetic algorithm of `Gitsbe` (so no predictions could be made for these topologies):
```{r zero-ss-topologies-fig, fig.width=7, fig.height=5, dpi=300, cache=TRUE, fig.cap='Percentage of topologies that did not have any boolean model with a stable state after simulations with Gitsbe ended. Every possible topology scrambling type is represented.', fig.align='center'}
ordered_types = c('none', 'source', 'target', 'sign', 'all')
scrambled_topo_res %>%
mutate(scramble_type =
replace(x = scramble_type, list = scramble_type == 'effect', values = 'sign')) %>%
group_by(scramble_type) %>%
summarise(percent = sum(is.na(roc_auc))/n(), .groups = 'drop') %>%
mutate(scramble_type = factor(scramble_type, levels = ordered_types)) %>%
ggplot(aes(x = scramble_type, y = percent, fill = scramble_type)) +
geom_col() +
geom_text(aes(label = scales::percent(percent, accuracy = 1)), vjust = -0.5, size = 8) +
scale_y_continuous(labels = scales::percent, limits = c(0,0.3)) +
scale_fill_brewer(palette = "Set1") +
guides(fill = guide_legend(title = latex2exp::TeX("Scramble Type"))) +
labs(x = "", title = "Topologies with zero-stable-state boolean models", y = "") +
theme_classic(base_size = 14) +
theme(axis.text.x = element_text(size = 18))
```
:::{.green-box}
So potentially tweaking the **source nodes** of each edge in the curated topology, resulted in $11\%$ of the produced topologies to have a network configuration that wouldn't allow the existence of attractor stability in the explored link-operator parameterization space of the `Gitsbe` algorithm.
Tweaking the **target nodes** results in less topologies having this property ($5\%$).
Lastly, tweaking the **effect** (activation vs inhibition), we always get topologies that can be translated to boolean models with a stable state attractor.
:::
### Source Scrambling {-}
In the next figures, the red dot/point is the result from using the original/unscrambled/curated CASCADE 1.0 topology:
```{r src-scrambling-figs, message=FALSE, fig.width=7, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='Source node scrambling vs Performance (ROC and PR AUC)', fig.align='center'}
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'source' | scramble_type == 'none', !is.na(roc_auc)),
x = "sim", y = "roc_auc", color = "scramble_type", palette = c('red', 'black'),
xlab = "Similarity Score",
title = "Source node Scrambling vs Performance (ROC)",
ylab = "ROC AUC") +
#, add = "reg.line", conf.int = TRUE,
#add.params = list(color = "blue", fill = "lightgray"),
#cor.coef = TRUE, cor.coeff.args = list(method = "kendall", label.y.npc = "top", size = 6, cor.coef.name = "tau") +
ylim(c(0,1)) +
geom_text(x = 0.95, y = 1, label = "CASCADE 1.0") +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'source' | scramble_type == 'none', !is.na(pr_auc)),
x = "sim", y = "pr_auc", color = "scramble_type", palette = c('red', 'black'),
xlab = "Similarity Score",
title = "Source node Scrambling vs Performance (Precision-Recall)",
ylab = "PR AUC") +
#add = "reg.line", conf.int = TRUE,
#add.params = list(color = "blue", fill = "lightgray"),
#cor.coef = TRUE, cor.coeff.args = list(method = "kendall", label.y.npc = "top", size = 6, cor.coef.name = "tau")) +
ylim(c(0,1)) +
geom_text(x = 0.9, y = 0.91, label = "CASCADE 1.0") +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
```
### Target Scrambling {-}
```{r trg-scrambling-figs, message=FALSE, fig.width=7, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='Target node scrambling vs Performance (ROC and PR AUC)', fig.align='center'}
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'target' | scramble_type == 'none', !is.na(roc_auc)),
x = "sim", y = "roc_auc", color = "scramble_type", palette = c('red', 'black'),
xlab = "Similarity Score",
title = "Target node Scrambling vs Performance (ROC)",
ylab = "ROC AUC") +
ylim(c(0,1)) +
geom_text(x = 0.95, y = 1, label = "CASCADE 1.0") +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'target' | scramble_type == 'none', !is.na(pr_auc)),
x = "sim", y = "pr_auc", color = "scramble_type", palette = c('red', 'black'),
xlab = "Similarity Score",
title = "Target node Scrambling vs Performance (Precision-Recall)",
ylab = "PR AUC") +
ylim(c(0,1)) +
geom_text(x = 0.9, y = 0.91, label = "CASCADE 1.0") +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
```
### Sign Inversion {-}
```{r sign-inv-figs, message=FALSE, fig.width=7, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='Sign Inversion vs Performance (ROC and PR AUC)', fig.align='center'}
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'effect' | scramble_type == 'none', !is.na(roc_auc)),
x = "sim", y = "roc_auc", color = "scramble_type", palette = c('black', 'red'),
xlab = "Similarity Score",
title = "Sign Inversion vs Performance (ROC)",
ylab = "ROC AUC") +
ylim(c(0,1)) +
geom_text(x = 0.9, y = 0.98, label = "CASCADE 1.0") +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'effect' | scramble_type == 'none', !is.na(pr_auc)),
x = "sim", y = "pr_auc", color = "scramble_type", palette = c('black', 'red'),
xlab = "Similarity Score",
title = "Sign Inversion vs Performance (Precision-Recall)",
ylab = "PR AUC") +
ylim(c(0,1)) +
geom_text(x = 0.9, y = 0.91, label = "CASCADE 1.0") +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
```
### Source, Target Scrambling and Sign Inversion {-}
```{r all-scrambling-figs, message=FALSE, fig.width=7, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='Source, Target node scrambling and Sign Inversion vs Performance (ROC and PR AUC)', fig.align='center'}
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'all' | scramble_type == 'none', !is.na(roc_auc)),
x = "sim", y = "roc_auc", color = "scramble_type", palette = c('black', 'red'),
xlab = "Similarity Score",
title = "All types of Scrambling vs Performance (ROC)",
ylab = "ROC AUC") +
ylim(c(0,1)) +
geom_text(x = 0.95, y = 1, label = "CASCADE 1.0") +
geom_hline(yintercept = 0.5, linetype = 'dashed', color = "red") +
geom_text(aes(x = 0.92, y = 0.45, label = "Random (AUC = 0.5)"), color = '#377EB8') +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
ggpubr::ggscatter(data = scrambled_topo_res %>%
filter(scramble_type == 'all' | scramble_type == 'none', !is.na(pr_auc)),
x = "sim", y = "pr_auc", color = "scramble_type", palette = c('black', 'red'),
xlab = "Similarity Score",
title = "All types of Scrambling vs Performance (Precision-Recall)",
ylab = "PR AUC") +
ylim(c(0,1)) +
geom_text(x = 0.9, y = 0.91, label = "CASCADE 1.0") +
geom_hline(yintercept = sum(observed)/length(observed), linetype = 'dashed', color = "red") +
geom_text(aes(x = 0.83, y = 0.08, label = "Random (AUC = 0.2)"), color = '#377EB8') +
theme(plot.title = element_text(hjust = 0.5), legend.position = 'none')
```
### Bootstrap Calibrated Models + Boxplots {-#boot-ss-cascade1-curated}
:::{.blue-box}
Since almost **all scrambled results** (no matter the type of scrambling) **are worse than the results we got when using the curated/unscrambled CASCADE 1.0 topology**, we proceed to further generate bootstrap model predictions derived from the curated topology to assess if the results we had found weren't artifacts and/or outliers.
We generate a large pool of `gitsbe` models ($1000$ simulations => $3000$ models) and draw randomly a total of $50$ batches of $50$ models each and assess ROC and PR AUC performance for each one of these normalized to the random model predictions (see [above](#best-roc-and-prc)).
All these bootstrapped models will be part of one category called **Curated**.
The rest of the scrambled topology data (that we presented in scatter plots) will be split to multiple groups based on their similarity score (percentage of common edges with curated topology) and we will visualize the different groups with boxplots.
See more details on how to reproduce these simulation results [here](#boot-ss-cascade1-curated-reproduce).
:::
Load the bootstrap results and tidy up the data:
```{r load-bootstrap-data}
# add the bootstrapped results of the curated topology to the scrambled results
scrambled_topo_res = readRDS(file = 'data/scrambled_topo_res_cascade1.rds')
boot_cascade1_res = readRDS(file = 'data/boot_cascade1_res.rds')
scrambled_topo_res = dplyr::bind_rows(scrambled_topo_res, boot_cascade1_res)
# group by similarity score
scrambled_topo_res =
scrambled_topo_res %>% mutate(grp = factor(x =
case_when(sim >= 0 & sim < 0.25 ~ '0 - 0.25',
sim >= 0.25 & sim < 0.5 ~ '0.25 - 0.5',
sim >= 0.5 & sim < 0.75 ~ '0.5 - 0.75',
sim >= 0.75 & sim < 0.85 ~ '0.75 - 0.85',
sim >= 0.85 & sim < 0.95 ~ '0.85 - 0.95',
sim >= 0.95 & sim < 1 ~ '0.95 - 1',
sim == 1 ~ 'Curated'),
levels = c('0 - 0.25', '0.25 - 0.5', '0.5 - 0.75',
'0.75 - 0.85', '0.85 - 0.95','0.95 - 1', 'Curated')))
```
```{r src-scrambling-figs-2, warning=FALSE, fig.width=7, fig.height=5, dpi=300, cache=TRUE, fig.show='hold', out.width='50%', fig.cap='Source node scrambling topologies + curated CASCADE 1.0 topology bootstrapped results (ROC and PR AUC)', fig.align='center'}
# ROC results
scrambled_topo_res %>%
filter(scramble_type == 'source' | scramble_type == 'none', !is.na(roc_auc)) %>%
ggplot(aes(x = grp, y = roc_auc, fill = grp)) +
geom_boxplot(show.legend = FALSE) +
scale_fill_brewer(palette = 'Set1') +
geom_jitter(shape = 20, position = position_jitter(0.2), show.legend = FALSE) +
ylim(c(0,1)) +
labs(x = 'Similarity Score to CASCADE 1.0 Topology', y = 'ROC AUC',
title = "Source Scrambling vs Performance (ROC)") +
theme_classic(base_size = 14) +
geom_hline(yintercept = 0.5, linetype = 'dashed', color = "red") +
geom_text(aes(x = 6.7, y = 0.45, label = "Random (AUC = 0.5)")) +
theme(plot.title = element_text(hjust = 0.5))
# PR results
scrambled_topo_res %>%
filter(scramble_type == 'source' | scramble_type == 'none', !is.na(pr_auc)) %>%
ggplot(aes(x = grp, y = pr_auc, fill = grp)) +
geom_boxplot(show.legend = FALSE) +
scale_fill_brewer(palette = 'Set1') +
geom_jitter(shape = 20, position = position_jitter(0.2), show.legend = FALSE) +
ylim(c(0,1)) +
labs(x = 'Similarity Score to CASCADE 1.0 Topology', y = 'PR AUC',
title = "Source Scrambling vs Performance (Precision-Recall)") +
theme_classic(base_size = 14) +
geom_hline(yintercept = sum(observed)/length(observed), linetype = 'dashed', color = "red") +
geom_text(aes(x = 6.7, y = 0.15, label = "Random (AUC = 0.2)")) +
theme(plot.title = element_text(hjust = 0.5))
```