-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
262 lines (210 loc) · 12.9 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
---
title: "Supplementary material for \"Incorporating statistical thinking into visualisation practices for decision-making in operational management\""
author:
- name: Emi Tanaka
affiliation: Department of Econometrics and Business Statistics, Monash University, Melbourne, VIC 3800
email: emi.tanaka@monash.edu
- name: Jessica Wai Yin Leung
affiliation: Department of Econometrics and Business Statistics, Monash University, Melbourne, VIC 3800
email: jessica.leung@monash.edu
- name: Dianne Cook
affiliation: Department of Econometrics and Business Statistics, Monash University, Melbourne, VIC 3800
email: dicook@monash.edu
bibliography: references.bib
output:
bookdown::html_document2:
code_folding: "hide"
theme: "paper"
code_download: true
number_sections: false
---
This is the supplementary material to an invited commentary for @basole2021. We provide all code that are used to generate the figures in the commentary in addition to other supplementary figures (and its code).
<ul class="fa-ul">
<li><span class="fa-li"><i class="fas fa-code"></i></span> To see the code, click on the CODE button. </li>
<li><span class="fa-li"><i class="fas fa-download"></i></span> You can also download the whole R Markdown file from the drop down menu on the top right corner.</li>
<li><span class="fa-li"><i class="fab fa-github"></i></span> The GitHub repo for this material can be found at https://github.com/emitanaka/supp-visOM. </li>
</ul>
**List of figures**
* [Figure S1](#fig:mimic-original): Recreating Figure 3 of @basole2021 using `ggplot2`.
* [Figure S2](#fig:fig3-alt): An alternative design to Figure 3 of @basole2021.
* [Figure S3](#fig:lineup-theirs): Lineup for the tile grid plot used in Figure 3 of @basole2021.
* [Figure S4](#fig:lineup-ours): Lineup for the scatter plot.
* [Figure S5](#fig:lineup-theirs-false): Lineup for the tile grid plot on data with purposely high association.
* [Figure S6](#fig:lineup-ours-false): Lineup for the scatter plot on data with purposely high association.
<details>
<summary>Code to load library and data</summary>
```{r setup, message = FALSE, warning = FALSE, class.source = 'fold-show'}
library(tidyverse)
library(ggtext)
library(patchwork)
library(readxl)
library(nullabor)
library(here)
library(janitor)
library(scales)
```
```{r knitr-setup, include = FALSE}
htmltools::tagList(rmarkdown::html_dependency_font_awesome())
knitr::opts_chunk$set(fig.path = "images/",
dev = c("png", "pdf", "svg"),
cache = FALSE,
cache.path = "cache/",
message = FALSE,
warning = FALSE)
options(knitr.duplicate.label = "allow")
```
```{r data, class.source = 'fold-show'}
df_full <- read_xlsx(here("data/MaskedCoverage-Fig3.xlsx")) %>%
clean_names() %>%
add_row(state = c("OR", "WY", "SD", "WV", "DC", "AL")) %>%
mutate(row = case_when(
state %in% c("ME") ~ 1L,
state %in% c("VT", "NH") ~ 2L,
state %in% c("WA", "ID", "MT", "ND", "MN", "IL", "WI", "MI", "NY", "RI", "MA") ~ 3L,
state %in% c("OR", "NV", "WY", "SD", "IA", "IN", "OH", "PA", "NJ", "CT") ~ 4L,
state %in% c("CA", "UT", "CO", "NE", "MO", "KY", "WV", "VA", "MD", "DE") ~ 5L,
state %in% c("AZ", "NM", "KS", "AR", "TN", "NC", "SC", "DC") ~ 6L,
state %in% c("OK", "LA", "MS", "AL", "GA") ~ 7L,
state %in% c("TX", "FL") ~ 8L,
TRUE ~ 0L),
col = case_when(
state %in% c("WA", "OR", "CA") ~ 1L,
state %in% c("ID", "NV", "UT", "AZ") ~ 2L,
state %in% c("MT", "WY", "CO", "NM") ~ 3L,
state %in% c("ND", "SD", "NE", "KS", "OK", "TX") ~ 4L,
state %in% c("MN", "IA", "MO", "AR", "LA") ~ 5L,
state %in% c("IL", "IN", "KY", "TN", "MS") ~ 6L,
state %in% c("WI", "OH", "WV", "NC", "AL") ~ 7L,
state %in% c("MI", "PA", "VA", "SC", "GA") ~ 8L,
state %in% c("NY", "NJ", "MD", "DC", "FL") ~ 9L,
state %in% c("VT", "RI", "CT", "DE") ~ 10L,
state %in% c("ME", "NH", "MA") ~ 11L,
TRUE ~ 0L
))
df_miss <- df_full %>%
filter(!is.na(readmission_rate))
```
</details>
(ref:mimicary) This figure recreates Figure 3 in @basole2021 using the `ggplot2` R-package [@ggplot2]. The code is displayed above by clicking on the CODE button just above the right corner of this plot.
```{r mimic-original, fig.height = 8, fig.width = 18, fig.cap = "(ref:mimicary)"}
g1 <- df_miss %>%
mutate(y = readmission_rate * 100) %>%
ggplot(aes(col, row)) +
geom_point(aes(size = coverage_obscured, color = y), alpha = 0.8) +
geom_text(aes(label = percent(y/100, 0.01)), nudge_y = -0.1, size = 2.5) +
labs(color = "Readmission Rate", size = "Coverage") +
scale_color_gradient2(low = "#3F6E9A", high = "#AB4C30", midpoint = median(df_miss$readmission_rate * 100), mid = "#ffffe0") +
theme_void() +
geom_text(data = df_full, aes(label = state), color = "black", nudge_y = 0.05) +
scale_size(range = c(3, 30)) +
scale_y_reverse() +
theme(plot.margin = margin(r = 30))
g2 <- g1 %+% mutate(df_miss, y = colorectal_cancer_screenings) +
scale_color_gradient2(low = "#3F6E9A", high = "#AB4C30", midpoint = median(df_miss$colorectal_cancer_screenings), mid = "#ffffe0") +
labs(color = "Cancer Screening Rate")
g1 + g2 + plot_layout(guides = "collect")
```
(ref:fig3-alt) The above figure show an alternative plot design to display the information in [Figure S1](#fig:mimic-original) and is the same figure as Figure 1 in the main paper. The plot shows a scatter plot of percentage of readmission and coverage on the left and a scatter plot of percentage of cancer screening and coverage on the right. Both plots are superimposed by a local polynomial regression (displayed as a blue line) with confidence interval for the line (displayed in gray). The code is displayed above by clicking on the CODE button just above the right corner of this plot.
```{r fig3-alt, fig.height = 4, fig.width = 8, fig.cap = "(ref:fig3-alt)"}
theme_set(theme_classic())
g1 <- ggplot(df_miss, aes(coverage_obscured * 100, readmission_rate * 100)) +
geom_point() +
labs(x = "Coverage (%)", y = "Readmission (%)") +
geom_smooth(method = loess, formula = y ~ x) +
annotate("richtext", x = 80, y = 15.3, label.color = NA, fill = "transparent", label = glue::glue("R<sup>2</sup> = {scales::comma(cor(df_miss$coverage_obscured, df_miss$readmission_rate)^2, 0.001)}"))
g2 <- ggplot(df_miss, aes(coverage_obscured * 100, colorectal_cancer_screenings)) +
geom_point() +
labs(x = "Coverage (%)", y = "Cancer Screening (%)") +
geom_smooth(method = loess, formula = y ~ x) +
annotate("richtext", x = 80, y = 73, label.color = NA, fill = "transparent", label = glue::glue("R<sup>2</sup> = {scales::comma(cor(df_miss$coverage_obscured, df_miss$colorectal_cancer_screenings)^2, 0.001)}"))
g1 + g2
```
```{r lineup-data}
set.seed(2021)
lineup_data <- null_permute("colorectal_cancer_screenings") %>%
lineup(true = df_miss, n = 20, pos = 3)
```
(ref:lineup-theirs) The above figure shows a lineup for this tile grid plot where one of the plots is made using the data, and the other nineteen plots are constructed after first permuting the percentage of cancer screening across different states with the missing value structure is preserved. Text and legends have been removed to minimise the bias in reading plots due to the reader being aware of the context. **Which plot strikes the most different to you?** The code is displayed above by clicking on the CODE button just above the right corner of this plot. Try the other lineups before finding the data plot position at the bottom of this document.
```{r lineup-theirs, fig.height = 10, fig.width = 10, fig.cap = "(ref:lineup-theirs)"}
plot_lineup_theirs <- ggplot(lineup_data, aes(col, row)) +
geom_point(aes(size = coverage_obscured, color = colorectal_cancer_screenings), alpha = 0.8) +
theme_void() +
scale_color_gradient2(low = "#3F6E9A", high = "#AB4C30", midpoint = median(df_miss$colorectal_cancer_screenings), mid = "#ffffe0") +
scale_size(range = c(1, 5)) +
scale_y_reverse(expand = c(0.1, 0.2)) +
guides(color = "none", size = "none") +
facet_wrap(~.sample, ncol = 5) +
scale_x_continuous(expand = c(0.1, 0.1)) +
theme(legend.position = "bottom",
strip.text = element_text(size = 18, margin = margin(t = 3, b = 3)),
strip.background = element_rect(color = "black", size = 1.5))
plot_lineup_theirs
```
(ref:lineup-ours) The above figure shows a lineup for the scatter plot design used in [Figure S2](#fig:fig3-alt). The same data used to create [Figure S3](#fig:lineup-theirs) (including the null data) is used to create this lineup. The code is displayed above by clicking on the CODE button just above the right corner of this plot. When you are ready, find the position of the data plot is revealed at the [bottom of this document](#position).
```{r lineup-ours, fig.height = 10, fig.width = 10, fig.cap = "(ref:lineup-ours)"}
plot_lineup_ours <- ggplot(lineup_data, aes(coverage_obscured * 100, colorectal_cancer_screenings)) +
geom_point() +
geom_smooth(method = loess, formula = y ~ x) +
facet_wrap(~.sample, ncol = 5) +
scale_x_continuous(expand = c(0.1, 0.1)) +
theme(legend.position = "bottom",
strip.text = element_text(size = 18, margin = margin(t = 3, b = 3)),
strip.background = element_rect(color = "black", size = 1.5),
axis.text = element_blank(),
axis.title = element_blank(),
axis.line = element_blank(),
axis.ticks.length = unit(0, "pt"))
plot_lineup_ours
```
# Same plots with higher associations between variables
The following are plots based on data that purposely modifies cancer screening to induce a higher association with the coverage. This higher association is induced (as shown in the code below) by rearranging data by the coverage and modifying the cancer screening percentage so that it is ordered from low to high.
```{r data-false, class.source="fold-show"}
df_false <- df_miss %>%
arrange(coverage_obscured) %>%
mutate(colorectal_cancer_screenings = sort(colorectal_cancer_screenings))
lineup_false_data <- null_permute("colorectal_cancer_screenings") %>%
lineup(true = df_false, n = 20, pos = 5)
```
(ref:lineup-theirs-false) The above shows a lineup for the tile grid plot design where the data was purposely manipulated to induce a higher association between the variables of interest. Which plot looks the most strikingly different to you? Try the next lineup to see if you can find the data plot before finding the position of the data plot [here](#position).
```{r lineup-theirs-false, fig.height = 10, fig.width = 10, fig.cap = "(ref:lineup-theirs-false)"}
plot_lineup_theirs %+% lineup_false_data
```
(ref:lineup-ours-false) The above shows a lineup for the scatter plot design with the data that was purposely manipulated so that two variables mapped to the $x$-axis and $y$-axis have a higher association. How easy was it to spot the data plot compared to [Figure S5](#fig:lineup-theirs-false)? You can find the code to generate the above lineup plot by collapsing all codes.
```{r lineup-ours-false, fig.height = 10, fig.width = 10, fig.cap = "(ref:lineup-ours-false)"}
plot_lineup_ours %+% lineup_false_data
```
# Positions of the data plot {#position}
The positions of the data plot for the lineup are as follows:
* [Figure S3](#fig:lineup-theirs): position 3.
* [Figure S4](#fig:lineup-ours): position 3.
* [Figure S5](#fig:lineup-theirs-false): position 5.
* [Figure S6](#fig:lineup-ours-false): position 5.
We expect that you would have struggled to find the data plto in [Figure S3](#fig:lineup-theirs) and [Figure S4](#fig:lineup-ours) as we do not observe strong association between the cancer screening rate and coverage. Additionally, we expect that most would spot the data plot in [Figure S5](#fig:lineup-theirs-false) and all would spot the data plot in [Figure S6](#fig:lineup-ours-false). For those that spot the data plot in [Figure S5](#fig:lineup-theirs-false), we suspect it took longer than spotting the data plot in [Figure S6](#fig:lineup-ours-false).
# Acknowledgement
We thank @basole2021 for supplying us the synthetic data to draw the above plots.
# Reference
<details>
<summary>Session Information</summary>
```{r session-info}
sessioninfo::session_info()
```
</details>
<br>
```{r copy-for-paper, include = FALSE}
# this chunk must be the last one
fs::file_copy("images/fig3-alt-1.pdf", "paper/", overwrite = TRUE)
fs::file_copy("images/lineup-theirs-1.pdf", "paper/", overwrite = TRUE)
# extract all the R code
knitr::purl("index.Rmd",
documentation = 1)
fs::file_move("index.R", "code/plots.R")
# remove this chunk and the knitr setup from the output R code
f <- readLines("code/plots.R")
i1 <- which(str_detect(f, "^## ----knitr-setup"))
chunk_indexes <- which(str_detect(f, "^## ----"))
j1 <- chunk_indexes[chunk_indexes > i1][1] - 1
i2 <- max(which(str_detect(f, "^## ----copy-for-paper")))
j2 <- length(f)
out <- f[-c(i1:j1, i2:j2)]
writeLines(out, "code/plots.R")
```