README.Rmd

---
title: "Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect"
author: "Jouni Helske, Satu Helske, Matthew Cooper, Anders Ynnerman, Lonni Besançon"
date: "13/4/2020"
output: 
  github_document:
    fig_width: 12
    fig_height: 12
    toc: true
    toc_depth: 3
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, width = 150)
```


# What is this

This repository contains data and scripts for reproducing the analysis of the paper *Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect* by Jouni Helske, Satu Helske, Matthew Cooper, Anders Ynnerman, and Lonni Besançon. 

This Rmd file contains full scripts (and extra stuff) to reproduce figures in the paper.

The raw data for both experiments can be found in folders `experiment1/data/` and `experiment2/data/` respectively, which also contains the R data frames used in the analysis (`exp1_data.rds` and `exp2_data.rds`). The web pages for the surveys are in folder `web`, with some screenshots in folder `screenshots`.


## One-sample experiment

### Creating the dataset

First, we load some packages:
```{r load_packages}
suppressPackageStartupMessages({
  library(brms)
  library(modelr)
  library(ggplot2)
  library(dplyr)
  library(jsonlite)
  library(loo)
  library(ggthemes)
})
```

Then we transform the raw data to suitable format for analysis:

```{r create_data_exp1, eval = FALSE}
path <- "experiment1/data"
answers <- list.files(path, pattern="answers", full.names = TRUE)

# fetch number of participants
n <- length(answers)

# create a data frame for the results
data_raw <- data.frame(id = rep(1:n, each = 32), viz = NA, replication = NA, value = NA,
  expertise = NA, degree = NA, age = NA, experience = NA, tools = NA)

# read in answers, not optimal way will do
for(i in 1:n){
  x <- strsplit(fromJSON(answers[i]), ",")
  dem <- fromJSON(paste0(path,  "/demography", x[[1]][1], ".txt"))
  for(j in 1:32) {
    data_raw[32*(i-1) + j, c("id", "viz", "replication", "value")] <- x[[j]]
    data_raw[32*(i-1) + j, c("expertise", "degree", "age", "experience", "tools")] <- 
      dem[c("expertise", "level", "age", "experience", "tools")]
  }
}
saveRDS(data_raw, file = "experiment1/data/data_raw.rds")
# remove person who didn't answer reasonably on the demography part
# Degree is None and more importantly expertise is 1..?
data <- data_raw[data_raw$degree != "None",]

# true p-values
true_p <- c(0.001, 0.01, 0.04, 0.05, 0.06, 0.1, 0.5, 0.8)

# convert to factors and numeric
data <- data %>% mutate(n = factor(ifelse(as.numeric(id) %% 8 < 4, 50, 200)),
  id = factor(id),
  viz = relevel(factor(viz, labels = c("CI", "gradient", "p", "violin")), "p"),
  replication = as.numeric(replication),
  value = as.numeric(value),
  p = true_p[replication],
  true_p = factor(p), # for monotonic but non-linear effect on confidence
  confidence = (value - 1) / 99,
  expertise = factor(expertise)) %>% arrange(id, viz)


# Classify expertise
data$expertise <- recode_factor(data$expertise, 
  
  "Statistics" = "Stats/ML",
  "statistics" = "Stats/ML",
  "statistics/machine learning" = "Stats/ML",
  "Analytics" = "Stats/ML",
  "Statistics/Medicine" = "Stats/ML",
  "Data science" = "Stats/ML",
  "Biostatistics" = "Stats/ML",
  "IT & Business Data Science" = "Stats/ML",
  "methods" = "Stats/ML",
  "AI" = "Stats/ML",
  "Neuroscience and Statistics" = "Stats/ML",
  "Computer vision" = "Stats/ML",
  "Psychometric" = "Stats/ML",
  
  "HCI, Visualization" = "VIS/HCI",
  "HCI/Visualization" = "VIS/HCI",
  "interaction design and evaluation" = "VIS/HCI",
  "Human-Computer Interaction" = "VIS/HCI",
  "HCI" = "VIS/HCI",
  "Vis" = "VIS/HCI",
  "Visualization" = "VIS/HCI",
  "Data Visualization" = "VIS/HCI",
  "CS, Visualization, HCI" = "VIS/HCI",
  "Infovis" = "VIS/HCI",
  "Visualization / Computer Science" = "VIS/HCI",
  "Virtual Reality" = "VIS/HCI",
  "Visualisation" = "VIS/HCI",
  "research in HCI" = "VIS/HCI",
  "Computer science" = "VIS/HCI",
  "Computer Science" = "VIS/HCI",
  
  "Social science" = "Social science and humanities",
  "Political science" = "Social science and humanities",
  "sociology" = "Social science and humanities",
  "Sociology" = "Social science and humanities",
  "Analytical Sociology" = "Social science and humanities",
  "Education research" = "Social science and humanities",
  "Economics" = "Social science and humanities", 
  "market research" = "Social science and humanities",
  "Politics" = "Social science and humanities",
  "Finance" = "Social science and humanities",
  "Linguistics" = "Social science and humanities",
  "Education Poliy" = "Social science and humanities",
  "Political Science" = "Social science and humanities",
  "Psychology" =  "Social science and humanities",
  "psychology" =  "Social science and humanities",
  "segregation" = "Social science and humanities",
  "Philosophy" = "Social science and humanities",
  "organizational science" = "Social science and humanities",
  "Strategic Management" = "Social science and humanities",
  "network analysis" = "Social science and humanities",
  "CSS" = "Social science and humanities",
  "Management" = "Social science and humanities",
  
  "Animal science" = "Physical and life sciences",
  "Biology" = "Physical and life sciences",
  "Botany" = "Physical and life sciences",
  "ecology" = "Physical and life sciences",
  "Zoology" = "Physical and life sciences",
  "Physics" = "Physical and life sciences",
  "cognitive neuroscience" = "Physical and life sciences",
  "Neuroscience" = "Physical and life sciences",
  "neuroscience/motor control" = "Physical and life sciences",
  "Biomechanics" = "Physical and life sciences",
  "Neurocognitive Psychology" = "Physical and life sciences",
  "pharma" =  "Physical and life sciences",
  "Public health" = "Physical and life sciences",
  "neurobiology" = "Physical and life sciences",
  "medicine" = "Physical and life sciences",
  "Molcular Biology" = "Physical and life sciences",
  "Wind Energy" = "Physical and life sciences",
  "Mathematical Biology" = "Physical and life sciences",
  "Pain" = "Physical and life sciences",
  "genomics" = "Physical and life sciences",
  "Medicine" = "Physical and life sciences",
  "Water engineering" = "Physical and life sciences")
data$expertise <- relevel(data$expertise, "Stats/ML")
```


### Descriptive statistics

```{r, cache = TRUE, echo = FALSE}
data <- readRDS("experiment1/data/exp1_data.rds")
```

Let's first look at some descriptive statistic:
```{r, cache = TRUE}
ids <- which(!duplicated(data$id))
barplot(table(data$expertise[ids]))
barplot(table(data$degree[ids]))
hist(as.numeric(data$age[ids]))
```

Let us now focus on the cliff effect as difference between confidence when $p$-value=0.04 versus $p$-value=0.06:
```{r cliff_effect_exp1, cache = TRUE}
data %>% group_by(id, viz) %>% 
  summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
  group_by(viz) %>%
  summarise(
    mean = mean(difference), 
    median = median(difference),
    sd = sd(difference), 
    se = sd(difference) / sqrt(length(difference)),
    "2.5%" = quantile(difference, 0.025), 
    "97.5%" = quantile(difference, 0.975))

data %>% group_by(id, viz) %>% 
  summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06]) %>%
  ggplot(aes(x = viz, y = difference)) + 
  geom_violin() +
  geom_point(alpha = 0.5, position = position_jitter(0.1)) +
  scale_y_continuous("Difference in confidence when p-value is 0.06 vs 0.04") +
  scale_x_discrete("Representation") +
  theme_classic() 
```

The cliff effect seems to be largest when information is presented as traditional CI or $p$-value which behave similarly. Gradient CI and Violin CI plots are pretty close to each other.

Now same but with subgrouping using sample size:
```{r cliff_effect_n_exp1, cache = TRUE}
data %>% group_by(id, viz, n) %>% 
  summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
  group_by(viz, n) %>%
  summarise(
    mean = mean(difference), 
    median = median(difference),
    sd = sd(difference), 
    se = sd(difference) / sqrt(length(difference)),
    "2.5%" = quantile(difference, 0.025), 
    "97.5%" = quantile(difference, 0.975))
```
and expertise:
```{r cliff_effect_expertise_exp1, cache = TRUE}
data %>% group_by(id, viz, expertise) %>% 
  summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
  group_by(viz, expertise) %>%
  summarise(
    mean = mean(difference), 
    median = median(difference),
    sd = sd(difference), 
    se = sd(difference) / sqrt(length(difference)),
    "2.5%" = quantile(difference, 0.025), 
    "97.5%" = quantile(difference, 0.975))
```

In terms of sample size, there doesn't seem to be clear differences in cliff effect especially when considering medians. In terms of expertise, there seems to be some differences especially in terms of variability (most notably the Violin plot for VIS/HCI), but the differences are likely due to few very extreme cases:
```{r cliff_effect_n_exp1_plot, cache = TRUE}
data %>% group_by(id, viz, expertise) %>% 
  summarize(
    difference = confidence[true_p==0.04] - confidence[true_p==0.06]) %>%
  ggplot(aes(x=viz, y = difference)) + geom_violin() + theme_classic() + 
  scale_y_continuous("Difference in confidence when p-value is 0.04 vs 0.06") +
  scale_x_discrete("Representation") +
  geom_point(aes(colour = expertise), position=position_jitter(0.1))
```

Let's check how the much extreme answers (full or zero confidence) there are in different groups:
```{r extreme_exp1, cache = TRUE}
data %>% group_by(id, viz, n) %>% 
  mutate(extreme = confidence %in% c(0, 1))  %>% 
  group_by(viz, n) %>%
  summarise(
    mean = mean(extreme),
    sd = sd(extreme), 
    se = sd(extreme) / sqrt(length(extreme)))

data %>% group_by(id, viz, expertise) %>% 
  mutate(extreme = confidence %in% c(0, 1))  %>% 
  group_by(viz, expertise) %>%
  summarise(
    mean = mean(extreme),
    sd = sd(extreme), 
    se = sd(extreme) / sqrt(length(extreme)))
```

Stats/ML and VIS/HCI groups tend to give slightly more extreme answers, but differences are quite small.


### Model

For modelling the data and the potential cliff effect we use piece-wise logit-normal model with following pdf:

$$
p(x)=\begin{cases}
\alpha (1 - \gamma), & \text{if $x = 0$},\\
\alpha \gamma, & \text{if $x = 1$},\\
(1 - \alpha) \phi(logit(x), \mu, \sigma), & \text{otherwise}.\\
\end{cases}
$$

Here $\alpha = P(x \in \{0, 1\})$ is the probability of answering one of the extreme values (not at all confident or fully confident), and $\gamma = P(x = 1 \mid x \in \{0, 1\})$, is the conditional probability of full confidence given that the answer is one of the extremes.

For $\mu$,$\alpha$,$\gamma$, and $\sigma$, we define following linear predictors:

$$
\begin{align}
\begin{split}
\mu        &\sim viz \cdot I(p < 0.05) \cdot logit(p) + 
viz \cdot I(p = 0.05) \\ 
& + (viz + I(p < 0.05) \cdot logit(p) + I(p = 0.05) \mid id),\\
\alpha     &\sim  p \cdot viz + (1 \mid id),\\
\gamma     &\sim mo(p),\\
\sigma     &\sim viz + (1 \mid id),
\end{split}
\end{align}
$$
where $p$ is a categorical variable defining the true $p$-value, logit($p$) is a continuous variable of the logit-transformed $p$-value, $mo(p)$ denotes a monotonic effect of the $p$-value, the dot corresponds to interaction (\ie $I(p = 0.05) \cdot viz$ \rev{implies} both the main and two-way interaction terms) and $(z \mid id)$ denotes participant-level random effect for variable $z$. As priors we used the relatively uninformative defaults of the \texttt{brms} package.

Now in a presence of a cliff effect we should observe a discontinuity in an otherwise linear relationship (in logit-logit scale) between the true $p$-value and participants' confidence. 

We also tested submodels of this model (omitting some of the interactions or random effects), and all of these models gave very similar results. However, this encompassing model integrates over the uncertainty regarding the parameter estimates (with coefficient zero corresponding to simpler model where the variable is omitted) and is that sense "more Bayesian" than selecting some of the simpler models (note that we are not particularly interested in predictive performance).

Now we create the necessary functions for our model:
```{r create_model, eval = TRUE, cache = TRUE}
stan_funs <- "
 real logit_p_gaussian_lpdf(real y, real mu, real sigma,
                            real zoi, real coi) {
     if (y == 0) { 
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(0 | coi); 
     } else if (y == 1) {
       return bernoulli_lpmf(1 | zoi) + bernoulli_lpmf(1 | coi);
     } else { 
       return bernoulli_lpmf(0 | zoi) + normal_lpdf(logit(y) | mu, sigma);
     } 
                                    }
  real logit_p_gaussian_rng(real y, real mu, real sigma,
                            real zoi, real coi) {
    // 0 or 1
    int zero_one = bernoulli_rng(zoi);
    if (zero_one == 1) {
      // casting to real
      int one = bernoulli_rng(coi);
      if (one == 1) {
        return 1.0;
      } else {
        return 0.0;
      }
    } else {
      return inv_logit(normal_rng(mu, sigma));
    }
  }
"

log_lik_logit_p_gaussian <- function(i, draws) {
  # mu <- draws$dpars$mu[, i]
  # zoi <- draws$dpars$zoi[, i]
  # coi <- draws$dpars$coi[, i]
  # sigma <- draws$dpars$sigma
  # y <- draws$data$Y[i]
  mu <- brms:::get_dpar(draws, "mu", i = i)
  zoi <- brms:::get_dpar(draws, "zoi", i = i)
  coi <- brms:::get_dpar(draws, "coi", i = i)
  sigma <- brms:::get_dpar(draws, "sigma", i = i)
  y <- draws$data$Y[i]
  if (y == 0) { 
    dbinom(1, 1, zoi, TRUE) + dbinom(0, 1, coi, TRUE)
  } else if (y == 1) {
    dbinom(1, 1, zoi, TRUE) + dbinom(1, 1, coi, TRUE)
  } else { 
    dbinom(0, 1, zoi, TRUE) + dnorm(qlogis(y), mu, sigma, TRUE)
  } 
}

predict_logit_p_gaussian <- function(i, draws, ...) {
  mu <- brms:::get_dpar(draws, "mu", i = i)
  zoi <- brms:::get_dpar(draws, "zoi", i = i)
  coi <- brms:::get_dpar(draws, "coi", i = i)
  sigma <- brms:::get_dpar(draws, "sigma", i = i)
  zero_one <- rbinom(length(zoi), 1, zoi)
  ifelse(zero_one, rbinom(length(coi), 1, coi), plogis(rnorm(length(mu), mu, sigma)))
}

fitted_logit_p_gaussian <- function(draws) {
  mu <- draws$dpars$mu
  zoi <- draws$dpars$zoi
  coi <- draws$dpars$coi
  sigma <- draws$dpars$sigma
  # no analytical solution for the mean of logistic normal distribution, rely on simulation
  for (i in 1:ncol(mu)) {
    for(j in 1:nrow(mu)) {
      mu[j, i] <- mean(plogis(rnorm(1000, mu[j, i], sigma[j])))
    }
  }
  zoi * coi + (1 - zoi) * mu
}


logit_p_gaussian <- custom_family(
  "logit_p_gaussian", 
  dpars = c("mu", "sigma", "zoi", "coi"),
  links = c("identity", "log", "logit", "logit"),
  lb = c(NA, 0, 0, 0), ub = c(NA, NA, 1, 1),
  type = "real", 
  log_lik = log_lik_logit_p_gaussian,
  predict = predict_logit_p_gaussian,
  fitted = fitted_logit_p_gaussian)
```

And create few additional variables:
```{r additional_vars_exp1, cache = TRUE}
data <- data %>% 
  mutate(
    logit_p = qlogis(p),
    p_lt0.05 = factor(p < 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    p_eq0.05 = factor(p == 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    cat_p = recode_factor(true_p, 
      "0.06" = ">0.05", "0.1" = ">0.05", "0.5" = ">0.05", "0.8" = ">0.05",
      .ordered = TRUE))
```

```{r, eval = FALSE}
fit_exp1 <- brm(bf(
  confidence ~ 
    viz * p_lt0.05 * logit_p + 
    viz * p_eq0.05 +
    (viz + p_lt0.05 * logit_p + p_eq0.05 | id),
  zoi ~ 
    viz * true_p + (viz | id),
  coi ~ mo(cat_p),
  sigma ~ viz + (1 | id)),
  data = data,
  family = logit_p_gaussian,
  stanvars = stanvar(scode = stan_funs, block = "functions"),
  chains = 4, cores = 4, iter = 2000, init = 0, 
  save_warmup = FALSE, save_all_pars = TRUE, refresh = 0)
```

### Results

First, let us check the parameter estimates of the model:

```{r fit_exp1_read, cache = TRUE, echo = FALSE}
fit_exp1 <- readRDS("experiment1/results/fit_noexpertise.rds")
```
```{r fit_exp1, cache = TRUE}
fit_exp1
```

Now we look at some figures. First we draw some samples from posterior predictive distribution and see how well our simulated replications match with our data:
```{r pp_check_exp1_a, cache = TRUE}
pp_check(fit_exp1, type = "hist", nsamples = 11)
```

We see that the histograms of the replicated datasets are similar to observed one, perhaps slight exaggeration of the tails. Next, we look the median confidence of replicated datasets grouped with underlying $p$-value:

```{r pp_check_exp1_b, cache = TRUE}
pp_check(fit_exp1, type = "stat_grouped", group = "true_p", stat = "median")
```

Now grouping based on visualization:
```{r pp_check_exp1_c, cache = TRUE}
pp_check(fit_exp1, type = "stat_grouped", group = "viz", stat = "mean")
```

Noting the scale on the x-axis, our histograms look reasonable given our data, although there are some subgroups where our model is slightly over- or underestimating compared to our data, especially in the violin CI group (reasonable changes to our model, such as dropping some interaction terms, did not improve this). Same posterior checks for average participants (with random effects zeroed out) we get very good results:

```{r pp_check_exp1_b_norandom, cache = TRUE}
pp_check(fit_exp1, type = "stat_grouped", group = "true_p", stat = "median", re_formula = NA)
pp_check(fit_exp1, type = "stat_grouped", group = "viz", stat = "mean", re_formula = NA)
```

Now we are ready to analyze the results. First, the posterior curves of the confidence given the underlying $p$-value:

```{r, cache = TRUE}
comb_exp1 <- fit_exp1$data %>% 
  data_grid(viz, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %>% 
  filter(interaction(logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %in% 
      unique(interaction( 
        fit_exp1$data$logit_p, fit_exp1$data$p_lt0.05, 
        fit_exp1$data$p_eq0.05, fit_exp1$data$cat_p, 
        fit_exp1$data$true_p)))

f_mu_exp1 <- posterior_epred(fit_exp1, newdata = comb_exp1, re_formula = NA)

d <- data.frame(value = c(f_mu_exp1), 
  p = rep(comb_exp1$true_p, each = nrow(f_mu_exp1)),
  viz = rep(comb_exp1$viz, each = nrow(f_mu_exp1)),
  iter = 1:nrow(f_mu_exp1))
levels(d$viz) <- c("Textual", "Classic CI", "Gradient CI", "Violin CI")
```

```{r posterior_curves_exp1, cache = TRUE}
sumr <- d %>% group_by(viz, p) %>%
  summarise(Estimate = mean(value), 
    Q2.5 = quantile(value, 0.025), 
    Q97.5 = quantile(value, 0.975)) %>%
  mutate(p = as.numeric(levels(p))[p])

cols <- c("Textual" = "#D55E00", "Classic CI" = "#0072B2", 
  "Gradient CI" = "#009E73", "Violin CI" = "#CC79A7")
x_ticks <- c(0.001, 0.01, 0.04, 0.06, 0.1, 0.5, 0.8)
y_ticks <- c(0.05, seq(0.1, 0.9, by = 0.1), 0.95)
dodge <- 0.19

p1 <- sumr %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, 
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.1, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5)))


p2 <- sumr %>% filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p <- p1 + coord_cartesian(xlim = c(0.001, 0.9), ylim = c(0.045, 0.95)) + 
  annotation_custom(
    ggplotGrob(p2), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))
p
```
```{r, echo = FALSE, eval = FALSE}
ggsave(p, filename = "experiment1/results/exp1_confidence.pdf", 
  width = 2*8.5, height = 12.5, 
  unit = "cm", device = "pdf")
```

The confidence level with traditional CI is most constant of all techniques when are within "statistically significant region" i.e. $p<0.05$, but there is a large drop when moving to $p>0.05$, even larger than with textual information with $p$-value, which behaves nearly identically with the Violin CI plot until $p=0.05$, when the confidence in $p$-value representation drops below all other techniques. The Gradient CI plot and Violin CI plot behave similarly, except the confidence level in case of Gradient CI plot is constantly below the Violin CI plot.

The probability curves of extreme answer show that traditional CI produces more easily extreme answers when $p<0.05$ (so the extreme answer is likely of full confidence), whereas $p$-value is more likely to lead extreme answer (zero confidence) when $p>0.05$. Differences between techniques seem nevertheless quite small compared to overall variation in the estimates.

```{r extreme_exp1_plot, cache = TRUE}
f_zoi_exp1_sumr <- fitted(fit_exp1, newdata = comb_exp1, 
  re_formula = NA, dpar = "zoi")
df_01_exp1 <- data.frame(
  p = plogis(comb_exp1$logit_p), 
  viz = comb_exp1$viz, 
  f_zoi_exp1_sumr)
levels(df_01_exp1$viz) <- 
  c("Textual", "Classic CI", "Gradient CI", "Violin CI")
y_ticks <- c(0.0001, 0.01, seq(0.1,0.9,by=0.2))

p <- df_01_exp1 %>% 
  ggplot(aes(x = p, y = Estimate, colour = viz)) +
  geom_linerange(aes(ymin = Q2.5, ymax = Q97.5),
    position = position_dodge(width=0.19)) + 
  geom_line(alpha=0.5, position = position_dodge(width=0.19))  + 
  ylab("Probability of all-or-none answer") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  theme_classic() + 
  scale_y_continuous(trans = "logit",
    breaks = y_ticks, labels = y_ticks, minor_breaks = NULL) + 
  scale_x_continuous(trans = "logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 10), legend.position = "bottom",   
    axis.title.x = element_text(size = 12),
    axis.text.y = element_text(size = 10), axis.title.y = element_text(size = 12),
    legend.text=element_text(size = 10), strip.text.x = element_text(size = 10)) 
p
```

Finally, we can compute the average drop in perceived confidence when moving from $p = 0.04$ to $p=0.06$:

```{r drop1, cache = TRUE}
d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  summarise(mean = mean(difference), sd = sd(difference),
    "2.5%" = quantile(difference, 0.025), 
    "97.5" = quantile(difference, 0.975))
```

Let's also visualize this:
```{r, cache = TRUE}
p <- d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>% 
  ggplot(aes(x = difference, fill = viz, colour = viz)) + 
  geom_density(bw = 0.01, alpha = 0.6) +
  theme_classic() + 
  scale_fill_manual("Representation", values =  cols) + 
  scale_colour_manual("Representation", values =  cols) + 
  ylab("Posterior density") + 
  xlab("E[confidence(p=0.04) - confidence(p=0.06)]") +
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14),
    legend.text = element_text(size = 14)) 
p
```
```{r, echo = FALSE, eval = FALSE}
ggsave(p, filename = "experiment1/results/exp1_cliff.pdf", 
  width = 2*8.5, height = 6.5, 
  unit = "cm", device = "pdf")
```

Note that the cliff effect between viz styles are not independent, i.e. if there is a large cliff effect with Violin CI then the cliff effect with $p$-value is likely larger as well. This can be seen from the posterior probabilities that cliff effect is larger with viz 1 (row variable) than with viz 2 (column variable):

```{r 'postprob1', cache = TRUE}
postprob <- d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  group_by(iter) %>% 
  mutate(p_vs_ci = difference[viz == "Textual"] - difference[viz == "Classic CI"],
    p_vs_gradient = difference[viz == "Textual"] - difference[viz == "Gradient CI"],
    p_vs_violin = difference[viz == "Textual"] - difference[viz == "Violin CI"],
    ci_vs_gradient = difference[viz == "Classic CI"] - difference[viz == "Gradient CI"],
    ci_vs_violin = difference[viz == "Classic CI"] - difference[viz == "Violin CI"],
    gradient_vs_violin = difference[viz == "Gradient CI"] - 
      difference[viz == "Violin CI"]) %>%
  ungroup() %>% summarise(
    "P(p > CI)" = mean(p_vs_ci > 0),
    "P(p > gradient)" = mean(p_vs_gradient > 0),
    "P(p > violin)" = mean(p_vs_violin > 0),
    "P(CI > gradient)" = mean(ci_vs_gradient > 0),
    "P(CI > violin)" = mean(ci_vs_violin > 0),
    "P(gradient > violin)" = mean(gradient_vs_violin > 0),
    "P(p > CI)" = mean(p_vs_ci > 0))
round(t(as.data.frame(postprob)), 2)
```

### Results for the model with expertise

Now we consider expanded model with with expertise as predictor:

```{r, eval = FALSE, cache = TRUE}
fit_expertise <- brm(bf(
  confidence ~ 
    expertise * viz * p_lt0.05 * logit_p + 
    expertise * viz * p_eq0.05 +
    (viz + p_lt0.05 * logit_p + p_eq0.05 | id),
  zoi ~ 
    expertise * viz + viz * true_p + (viz | id),
  coi ~ mo(cat_p),
  sigma ~ expertise * viz + (1 | id)),
  data = data,
  family = logit_p_gaussian,
  stanvars = stanvar(scode = stan_funs, block = "functions"),
  chains = 4, cores = 4, iter = 2000, init = 0, 
  save_warmup = FALSE, save_all_pars = TRUE, refresh = 0)
```
```{r, eval = TRUE, cache = TRUE, echo = FALSE}
fit_expertise <- readRDS("experiment1/results/fit_expertise.rds")
```
```{r, cache = TRUE}
comb_exp1 <- fit_expertise$data %>% 
  data_grid(expertise, viz, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %>% 
  filter(interaction(expertise, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %in% 
      unique(interaction(fit_expertise$data$expertise, 
        fit_expertise$data$logit_p, fit_expertise$data$p_lt0.05, 
        fit_expertise$data$p_eq0.05, fit_expertise$data$cat_p, 
        fit_expertise$data$true_p)))

f_mu_exp1 <- posterior_epred(fit_expertise, newdata = comb_exp1, re_formula = NA)

d <- data.frame(value = c(f_mu_exp1), 
  p = rep(comb_exp1$true_p, each = nrow(f_mu_exp1)),
  viz = rep(comb_exp1$viz, each = nrow(f_mu_exp1)),
  expertise = rep(comb_exp1$expertise, each = nrow(f_mu_exp1)),
  iter = 1:nrow(f_mu_exp1))

levels(d$viz) <- c("Textual", "Classic CI", "Gradient CI", "Violin CI")
```

Here are posterior curves for the four different groups:
```{r 'posterior_curves_exp1_expertise', cache = TRUE}
sumr <- d %>% group_by(viz, p, expertise) %>%
  summarise(Estimate = mean(value), 
    Q2.5 = quantile(value, 0.025), 
    Q97.5 = quantile(value, 0.975)) %>%
  mutate(p = as.numeric(levels(p))[p])

x_ticks <- c(0.001, 0.01, 0.04, 0.06, 0.1, 0.5, 0.8)
y_ticks <- c(0.05, seq(0.1, 0.9, by = 0.1), 0.95)
dodge <- 0.19

p11 <- sumr %>% filter(expertise == "Stats/ML") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, 
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.margin = margin(t = -0.1, b = 0, unit = "cm"),
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), 
    ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p21 <- sumr %>% filter(expertise == "Stats/ML") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

yrange <- c(min(sumr$Q2.5)-0.001, max(sumr$Q97.5) +0.001)
p1 <- p11 + coord_cartesian(xlim = c(0.001, 0.9), 
  ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p21), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))

p12 <- sumr %>% filter(expertise == "VIS/HCI") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p22 <- sumr %>% filter(expertise == "VIS/HCI") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
   scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p2 <- p12 + coord_cartesian(xlim = c(0.001, 0.9), ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p22), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))

p13 <- sumr %>% filter(expertise == "Social science and humanities") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p23 <- sumr %>% filter(expertise == "Social science and humanities") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p3 <- p13 + coord_cartesian(xlim = c(0.001, 0.9), ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p23), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))


p14 <- sumr %>% filter(expertise == "Physical and life sciences") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p24 <- sumr %>% filter(expertise == "Physical and life sciences") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p4 <- p14 + coord_cartesian(xlim = c(0.001, 0.9), ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p24), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))


library(patchwork)
p <- (p1 + ggtitle("Stats/ML")) + (p2 + ggtitle("VIS/HCI")) + 
  (p3 + ggtitle("Social sciences and humanities")) + 
  (p4 + ggtitle("Physical and life sciences"))
p
```


There are some differences between confidence curves between groups: In Physical and life sciences the visualization affects only little on the confidence curves; gradient CI and violin CI produce very linear curves in VIS/HCI group; and there is very large drop in confidence in case of classic CI in Stats/ML group. However, the ordering in terms of cliff effect is same in all groups.

We can also draw same figure when averaging over the groups:

```{r posterior_curves_exp1_marginal, cache = TRUE}
sumr <- d %>% group_by(viz, p) %>%
  summarise(Estimate = mean(value), 
    Q2.5 = quantile(value, 0.025), 
    Q97.5 = quantile(value, 0.975)) %>%
  mutate(p = as.numeric(levels(p))[p])

p1 <- sumr %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, 
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5)))


p2 <- sumr %>% filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p <- p1 + coord_cartesian(xlim = c(0.001, 0.9), ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p2), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))
p

```
```{r, echo = FALSE, eval = FALSE}
ggsave(p, filename = "experiment1/results/exp1_confidence_marginal.pdf", 
  width = 2*8.5, height = 12.5, 
  unit = "cm", device = "pdf")
```

We see that the results are very similar to the model without the expertise variable, except naturally the credible intervals in the above figure are somewhat wider when we average over the expertise groups with different overall levels (in model without expertise, these differences are captured by the participant-level effects which are then zeroed out when considering average participant).

Now the potential cliff effect:
```{r, cache = TRUE}
d %>% group_by(viz, iter,expertise) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>% 
  ggplot(aes(x = difference, fill = viz, colour = viz)) + 
  geom_density(bw = 0.01, alpha = 0.6) +
  theme_classic() + 
  scale_fill_manual("Representation", values =  cols) + 
  scale_color_manual("Representation", values =  cols) + 
  ylab("Posterior density") + 
  xlab("Cliff effect") +
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14)) +facet_wrap(~expertise)

d %>% group_by(expertise, viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  summarise(mean = mean(difference), sd = sd(difference),
    "2.5%" = quantile(difference, 0.025), 
    "97.5" = quantile(difference, 0.975))
```

We see some differences between the group-wise estimates (but note the standard deviation). For example the drop in confidence seems to be smallest in the VIS/HCI group with gradient and violin CIs (and $p$-value and classic CI perform relatively similar) and somewhat surprisingly the drops are largest in the Stats/ML group for all representation values. However, in all groups the classic CI has largest drop, with $p$-value following second (expect there is a virtually tie with the gradient CI in Phys./life sciences group), and relatively equal drops with gradient and violin CIs.

When we average over these groups to obtain marginal means we see almost identical results compared to the model without expertise:

```{r, cache = TRUE}
d %>% group_by(expertise, viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  group_by(viz) %>%
  summarise(mean = mean(difference), sd = sd(difference),
    "2.5%" = quantile(difference, 0.025), 
    "97.5" = quantile(difference, 0.975))
```

Density plots of course show multimodality due to group differences:
```{r, cache = TRUE}
d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>% 
  ggplot(aes(x = difference, fill = viz, colour = viz)) + 
  geom_density(bw = 0.01, alpha = 0.6) +
  theme_classic() + 
  scale_fill_manual("Representation", 
    values = cols[1:4]) + 
  scale_colour_manual("Representation", 
    values = cols[1:4]) + 
  ylab("Posterior density") + 
  xlab("Cliff effect") +
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14)) 
```

When reflecting these results to the descriptive statistics of the data, especially the cliff effect for each expertise group and visualization style, it should be noted that especially in VIS/HCI case there are large proportion of answers with a clear "negative drop" of confidence around $p=0.05$, which could be considered "equally wrong interpretation" as the cliff effect itself. These cases also negate the big changes to other direction making the overall drop for these groups smaller. 

<!-- Thus it could be interesting to study absolute changes of confidence, or alternatively clean these "outliers" (most of them are not really outliers as due to the experiment's design there is natural random variation in the answers) and reanalyse the data (which we did not do here as we wanted to stary true to our preregistration where we did not consider this). -->

For example, here is the proportion of curves where the the change in confidence $\delta < -0.2$ (average drop over all viz and expertise groups was estimated as $0.2$: 

```{r, cache = TRUE}
data %>% group_by(id, viz, expertise) %>% 
    summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
    group_by(viz, expertise) %>%
    summarise(
        negative_cliff = round(mean(difference < -0.2), 2))
data %>% group_by(id, viz, expertise) %>% 
    summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
    group_by(expertise) %>%
    summarise(
        negative_cliff = round(mean(difference < -0.2), 2))
data %>% group_by(id, viz, expertise) %>% 
    summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
    group_by(viz) %>%
    summarise(
        negative_cliff = round(mean(difference < -0.2), 2))
```


### Reanalysis with cleaned data

Finally we test how does the results depend on those confidence curves which have clearly positive slope or large increases in confidence when $p$-value increases.

We use simple linear model with logit-transformed $p$-value and trimmed logit-transformed confidence, and check whether the corresponding coefficient is clearly positive (arbitrarily chose as 0.1), and in addition to this we remove those curves where the difference between two consecutive confidence values are larger than 0.2 (this should of course be negative):

```{r cleaned1, cache = TRUE, eval = FALSE}
data <- readRDS("experiment1/data/exp1_data.rds")
outliers_slope <- data %>% group_by(id, viz) %>%
  mutate(p_logit = qlogis(p), c_logit = qlogis(
    ifelse(confidence == 0, 0.001, ifelse(confidence == 1, 0.999, confidence)))) %>%
  summarize(
    slope = coef(lm(c_logit ~ p_logit))[2]) %>%
  filter(slope > 0.1)

outliers_diff <- data %>% group_by(id, viz) %>% 
  arrange(p, .by_group = TRUE) %>%
  mutate(diff = c(0,diff(confidence)), neg_diff = any(diff > 0.2)) %>%
  filter(neg_diff)

data_cleaned <- data %>%
  filter(!(interaction(id,viz) %in% 
      interaction(outliers_slope$id, outliers_slope$viz))) %>%
  filter(!(interaction(id,viz) %in% 
      interaction(outliers_diff$id, outliers_diff$viz))) 

data_cleaned <- data_cleaned %>% 
  mutate(
    logit_p = qlogis(p),
    p_lt0.05 = factor(p < 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    p_eq0.05 = factor(p == 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    cat_p = recode_factor(true_p, 
      "0.06" = ">0.05", "0.1" = ">0.05", "0.5" = ">0.05", "0.8" = ">0.05",
      .ordered = TRUE))

fit_exp1 <- brm(bf(
  confidence ~ 
    viz * p_lt0.05 * logit_p + 
    viz * p_eq0.05 +
    (viz + p_lt0.05 * logit_p + p_eq0.05 | id),
  zoi ~ 
    viz * true_p + (viz | id),
  coi ~ mo(cat_p),
  sigma ~ viz + (1 | id)),
  data = data_cleaned,
  family = logit_p_gaussian,
  stanvars = stanvar(scode = stan_funs, block = "functions"),
  chains = 4, cores = 4, iter = 2000, init = 0, 
  save_warmup = FALSE, refresh = 0)
```

```{r, echo = FALSE}
fit_exp1 <- readRDS("experiment1/results/fit_cleaned.rds")
```

```{r cleaned_curves, cache = TRUE}
comb_exp1 <- fit_exp1$data %>% 
  data_grid(viz, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %>% 
  filter(interaction(logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %in% 
      unique(interaction( 
        fit_exp1$data$logit_p, fit_exp1$data$p_lt0.05, 
        fit_exp1$data$p_eq0.05, fit_exp1$data$cat_p, 
        fit_exp1$data$true_p)))

f_mu_exp1 <- posterior_epred(fit_exp1, newdata = comb_exp1, re_formula = NA)

d <- data.frame(value = c(f_mu_exp1), 
  p = rep(comb_exp1$true_p, each = nrow(f_mu_exp1)),
  viz = rep(comb_exp1$viz, each = nrow(f_mu_exp1)),
  iter = 1:nrow(f_mu_exp1))
levels(d$viz) <- c("Textual", "Classic CI", "Gradient CI", "Violin CI")

sumr <- d %>% group_by(viz, p) %>%
  summarise(Estimate = mean(value), 
    Q2.5 = quantile(value, 0.025), 
    Q97.5 = quantile(value, 0.975)) %>%
  mutate(p = as.numeric(levels(p))[p])

cols <- c("Textual" = "#D55E00", "Classic CI" = "#0072B2", 
  "Gradient CI" = "#009E73", "Violin CI" = "#CC79A7")
x_ticks <- c(0.001, 0.01, 0.04, 0.06, 0.1, 0.5, 0.8)
y_ticks <- c(0.05, seq(0.1, 0.9, by = 0.1), 0.95)
dodge <- 0.19

p1 <- sumr %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, 
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.1, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5)))


p2 <- sumr %>% filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p <- p1 + coord_cartesian(xlim = c(0.001, 0.9), ylim = c(0.045, 0.95)) + 
  annotation_custom(
    ggplotGrob(p2), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))
p
```
```{r, cache=TRUE}
d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  summarise(mean = mean(difference), sd = sd(difference),
    "2.5%" = quantile(difference, 0.025), 
    "97.5" = quantile(difference, 0.975))
```

Overall, the results are in line with the analysis of to full data, except that the average $\delta$ is larger in all groups.
    
### Subjective rankings of the representation styles

Now we focus on analysis the subjective rankings of the technique. Read the feedback data and merge it with the previous data which contains the expertise information: 

```{r create_rankdata_exp1, eval = FALSE, cache = TRUE}
files <- list.files(path, pattern = "subjective", full.names = TRUE)
n <- length(files)

rankdata <- data.frame(id = rep(1:n, each=4),
  viz = factor(rep(c("p", "ci", "violin", "gradient")), 
    levels=c("p", "ci", "violin", "gradient")),
  rank = factor(NA, levels=1:4))

for(i in 1:n) {
  fb <- fromJSON(files[i])
  rankdata$id[4*(i-1) + 1:4] <- strsplit(strsplit(files[i], "subjective")[[1]], ".txt")[[2]]
  rankdata$rank[4*(i-1) + 1:4] <- factor(fb$rank)
}

rankdata$viz <- recode_factor(rankdata$viz, "p" = "p", "ci" = "CI",
  "gradient" = "gradient", "violin" = "violin")
rankdata$rank <- factor(rankdata$rank, ordered = TRUE)
rankdata$id <- factor(rankdata$id, levels = levels(data$id))
ranks_exp1 <- distinct(inner_join(rankdata, data[,c("id", "viz", "expertise")]))
```
```{r, echo=FALSE, cache = TRUE}
ranks_exp1 <- readRDS("experiment1/data/exp1_rankdata.rds")
fit_rank1 <- readRDS("experiment1/results/fit_rank1.rds")
```

For analysing the subjective rankings of the representation styles, we use a Bayesian ordinal regression model. We test two models, one with expertise and another without it:
```{r fit_models_rank_exp1, eval=FALSE, cache = TRUE}
fit_rank1 <- brm(rank ~ viz + (1 | id), family=cumulative, 
  data = ranks_exp1, refresh = 0)
saveRDS(fit_rank1, file = "experiment1/results/fit_rank1.rds")
```

Plot ranking probabilities:
```{r rank_exp1_plot, cache = TRUE}
colsrank <- scales::brewer_pal(palette = "PiYG",
  direction = -1)(4)

effects_exp1 <- conditional_effects(fit_rank1, effects = "viz", 
  plot = FALSE, categorical = TRUE, reformula = NA)

p <- ggplot(effects_exp1[[1]], aes(x = viz, y = estimate__, colour = cats__)) + 
  geom_point(position=position_dodge(0.5)) + 
  geom_errorbar(width=0.25, aes(ymin=lower__, ymax = upper__), 
    position = position_dodge(0.5)) + 
  theme_classic() + 
  ylab("Ranking \n probability") + 
  xlab("Representation") +
  scale_x_discrete(labels =c("Textual", "Classic CI", "Gradient CI", "Violin CI")) +
  scale_color_manual("Rank", 
    values = colsrank,
    labels = c("1 (best)", "2", "3", "4 (worst)")) + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12,hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14),
    legend.text = element_text(size = 14)) 
p
```
```{r, eval = FALSE, echo = FALSE}
ggsave(p, file="experiment1/results/ranks1.pdf",  
  width = 2*8.5, height = 6, 
  unit = "cm", device = "pdf")
```

We see that the $p$-values are likely to be ranked very low, while violin CI and classic CI are the most preferred options, and gradient CI seems to divide opinions most.

## Two-sample experiment

Let us turn our attention to the second experiment, for which we essentially use the same workflow as for the first experiment.

### Creating the dataset

```{r create_data_exp2, eval = FALSE}
path <- "experiment2/data"
answers <- list.files(path, pattern="answers", full.names = TRUE)
n <- length(answers)

# create a data frame for the results
data_raw <- data.frame(id = rep(1:n, each = 32), viz = NA, replication = NA, value = NA,
  expertise = NA, degree = NA, age = NA, experience = NA, tools = NA)

# read in answers, not optimal way will do
for(i in 1:n){
  x <- strsplit(fromJSON(answers[i]), ",")
  dem <- fromJSON(paste0(path,  "/demography", x[[1]][1], ".txt"))
  for(j in 1:32) {
    data_raw[32*(i-1) + j, c("id", "viz", "replication", "value")] <- x[[j]]
    data_raw[32*(i-1) + j, c("expertise", "degree", "age", "experience", "tools")] <- 
      dem[c("expertise", "level", "age", "experience", "tools")]
  }
}
saveRDS(data_raw, file = "experiment2/data/data_raw.rds")
# remove person who didn't answer on the demography part
data_raw <- data_raw[data_raw$expertise != "",]

true_p <- c(0.001, 0.01, 0.04, 0.05, 0.06, 0.1, 0.5, 0.8)

data2 <- data_raw %>% mutate(n = factor(ifelse(as.numeric(id) %% 8 < 4, 50, 200)),
  id = factor(id),
  viz = relevel(factor(viz, labels = c("CI", 
    "Gradient", 
    "Continuous Violin", 
    "Discrete Violin")),
    "CI"),
  replication = as.numeric(replication),
  value = as.numeric(value),
  p = true_p[replication],
  true_p = factor(p), # for monotonic but non-linear effect on confidence
  confidence = (value - 1) / 99,
  expertise = factor(expertise)) %>% arrange(id, viz)

# Classify the expertise
data2$expertise <- recode_factor(data2$expertise, 
  "Statistics" = "Stats/ML",
  "machine learning, statistics" = "Stats/ML",
  "Human Factors, experiment design" = "Stats/ML",
  "Consulting" = "Stats/ML",
  "Computer vision" = "Stats/ML",
  "Meta-research"  = "Stats/ML",
  "Epidemiology" = "Stats/ML",
  
  "infovis" = "VIS/HCI",
  "HCI and VIS" = "VIS/HCI",
  "HCI" = "VIS/HCI",
  "vis" = "VIS/HCI",
  "Vis and HCI" = "VIS/HCI", 
  "Visualisation" = "VIS/HCI",
  "Visualization" = "VIS/HCI",
  
  "sociology" = "Social science and humanities",
  "Sociology" = "Social science and humanities",
  "Psychology" = "Social science and humanities",
  "psychology" = "Social science and humanities",
  "health economics" = "Social science and humanities",
  "Sport psychology" = "Social science and humanities",
  "economics" = "Social science and humanities",
  
  "Psychology / Neuroscience" = "Physical and life sciences",
  "Ecology" = "Physical and life sciences",
  "Biology" = "Physical and life sciences",
  "Biology " = "Physical and life sciences",
  "Developmental Biology" = "Physical and life sciences",
  "Microbiology" = "Physical and life sciences"
)
data2$expertise <- relevel(data2$expertise, "Stats/ML")
```
```{r, echo=FALSE, cache = TRUE}
data2 <- readRDS("experiment2/data/exp2_data.rds")
```

### Descriptive statistics

As in first experiment, we first look at some descriptive statistics. 
```{r desc2, cache = TRUE}
ids <- which(!duplicated(data2$id))
barplot(table(data2$expertise[ids]))
barplot(table(data2$degree[ids]))
hist(as.numeric(data2$age[ids]))
```

Again we now focus on the cliff effect as difference between confidence when $p$-value=0.04 versus $p$-value=0.06:
```{r cliff_effect_exp2, cache = TRUE}
data2 %>% group_by(id, viz) %>% 
  summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
  group_by(viz) %>%
  summarise(
    mean = mean(difference), 
    median = median(difference),
    sd = sd(difference), 
    se = sd(difference) / sqrt(length(difference)),
    "2.5%" = quantile(difference, 0.025), 
    "97.5%" = quantile(difference, 0.975))

data2 %>% group_by(id, viz) %>% 
  summarize(difference = confidence[true_p==0.04] - confidence[true_p==0.06]) %>%
  ggplot(aes(x = viz, y = difference)) + 
  geom_violin() +
  geom_point(alpha = 0.5, position = position_jitter(0.1)) +
  scale_y_continuous("Difference in confidence when p-value is 0.06 vs 0.04") +
  scale_x_discrete("Representation") +
  theme_classic() 
```

Interestingly, while the cliff effect is again largest with classic CI, there are some cases where the discrete Violin CI has lead to very large drop in confidence. Overall the cliff effect seems to be much smaller than in the one-sample case (there the average drop was around 0.1-0.3 depending on the technique).

Now same but with subgrouping using sample size:
```{r cliff_effect_n_exp2, cache=TRUE}
data2 %>% group_by(id, viz, n) %>% 
  summarize(diff = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
  group_by(viz, n) %>%
  summarise(
    mean = mean(diff), 
    sd = sd(diff), 
    se = sd(diff) / sqrt(length(diff)),
    "2.5%" = quantile(diff, 0.025), 
    "97.5%" = quantile(diff, 0.975))
```
and expertise:
```{r cliff_effect_expertise_exp2, cache=TRUE}
data2 %>% group_by(id, viz, expertise) %>% 
  summarize(diff = confidence[true_p==0.04] - confidence[true_p==0.06])  %>% 
  group_by(viz, expertise) %>%
  summarise(
    mean = mean(diff), 
    sd = sd(diff), 
    se = sd(diff) / sqrt(length(diff)),
    "2.5%" = quantile(diff, 0.025), 
    "97.5%" = quantile(diff, 0.975))

data2 %>% group_by(id, viz,expertise) %>% 
  summarize(
    difference = confidence[true_p==0.04] - confidence[true_p==0.06]) %>%
  ggplot(aes(x=viz, y = difference)) + geom_violin() + theme_classic() + 
  scale_y_continuous("Difference in confidence when p-value is 0.04 vs 0.06") +
  scale_x_discrete("Representation") +
  geom_point(aes(colour = expertise), position=position_jitter(0.1))
```

It is difficult to say anything definite but there doesn't seem to be clear differences between samples sizes or expertise, although again it is VIS/HCI group which can be "blamed" for extreme changes in violin cases.

Let's check how the much extreme answers (full or zero confidence) there are in different groups:
```{r extreme_exp2, cache=TRUE}
data2 %>% group_by(id, viz, n) %>% 
  mutate(extreme = confidence %in% c(0, 1))  %>% 
  group_by(viz, n) %>%
  summarise(
    mean = mean(extreme),
    sd = sd(extreme), 
    se = sd(extreme) / sqrt(length(extreme)))

data2 %>% group_by(id, viz, expertise) %>% 
  mutate(extreme = confidence %in% c(0, 1))  %>% 
  group_by(viz, expertise) %>%
  summarise(
    mean = mean(extreme),
    sd = sd(extreme), 
    se = sd(extreme) / sqrt(length(extreme)))
```
Compared to first experiment, here Stats/ML group performs best.

### Model

Again, create some additional variables:
```{r additional_vars_exp2, cache=TRUE}
data2 <- data2 %>% 
  mutate(
    logit_p = qlogis(p),
    p_lt0.05 = factor(p < 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    p_eq0.05 = factor(p == 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    cat_p = recode_factor(true_p, "0.06" = ">0.05", "0.1" = ">0.05", "0.5" = ">0.05", "0.8" = ">0.05",
      .ordered = TRUE))
```

And fit the same models as in first experiment:
```{r eval = FALSE, cache = TRUE}
fit_exp2 <- brm(bf(
  confidence ~ 
    viz * p_lt0.05 * logit_p + 
    viz * p_eq0.05 +
    (viz + p_lt0.05 * logit_p + p_eq0.05 | id),
  zoi ~ 
    viz * true_p + (viz | id),
  coi ~ mo(cat_p),
  sigma ~ viz + (1 | id)),
  data = data2,
  family = logit_p_gaussian,
  stanvars = stanvar(scode = stan_funs, block = "functions"),
  chains = 4, cores = 4, iter = 2000, init = 0, 
  save_warmup = FALSE, save_all_pars = TRUE, refresh = 0)
```
```{r, eval = TRUE, cache = TRUE, echo = FALSE}
fit_exp2 <- readRDS("experiment2/results/fit_noexpertise.rds")
```
And same with expertise:
```{r, eval = FALSE, cache = TRUE}
fit_expertise <- brm(bf(
  confidence ~ 
    expertise * viz * p_lt0.05 * logit_p + 
    expertise * viz * p_eq0.05 +
    (viz + p_lt0.05 * logit_p + p_eq0.05 | id),
  zoi ~ 
    expertise * viz + expertise * true_p + viz * true_p + (viz | id),
  coi ~ mo(cat_p),
  sigma ~ expertise * viz + (1 | id)),
  data = data,
  family = logit_p_gaussian,
  stanvars = stanvar(scode = stan_funs, block = "functions"),
  chains = 4, cores = 4, iter = 2000, init = 0, 
  save_warmup = FALSE, save_all_pars = TRUE, refresh = 0)
```
```{r, eval = TRUE, cache = TRUE, echo = FALSE}
fit_expertise <- readRDS("experiment2/results/fit_expertise.rds")
```


### Results

```{r fit_exp2, cache=TRUE}
fit_exp2
```

Now we look at some figures. First we draw some samples from posterior predictive distribution and see how well our simulated replications match with our data:
```{r pp_check_exp2, cache=TRUE}
pp_check(fit_exp2, type = "hist", nsamples = 11)
pp_check(fit_exp2, type = "stat_grouped", group = "true_p", stat = "median")
pp_check(fit_exp2, type = "stat_grouped", group = "viz", stat = "mean")
```

Note the difference compared to the first experiment: There is much less extremely confident answers than in the first case which was pretty symmetrical between the zero and full confidence answers. These posterior predictive samples are nicely in line with the data (it is feasible that the data could have been generated by this model).

Now we are ready to analyze the results. First, the posterior curves of the confidence given the underlying $p$-value:
```{r, cache = TRUE}
comb_exp2 <- fit_exp2$data %>% 
  data_grid(viz, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %>% 
  filter(interaction(logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %in% 
      unique(interaction( 
        fit_exp2$data$logit_p, fit_exp2$data$p_lt0.05, 
        fit_exp2$data$p_eq0.05, fit_exp2$data$cat_p, 
        fit_exp2$data$true_p)))

f_mu_exp2 <- posterior_epred(fit_exp2, newdata = comb_exp2, re_formula = NA)

d <- data.frame(value = c(f_mu_exp2), 
  p = rep(comb_exp2$true_p, each = nrow(f_mu_exp2)),
  viz = rep(comb_exp2$viz, each = nrow(f_mu_exp2)),
  iter = 1:nrow(f_mu_exp2))

levels(d$viz) <- c("Classic CI", "Gradient CI", "Cont. violin CI", "Disc. violin CI")
cols  <- c("Classic CI" = "#0072B2", 
  "Gradient CI" = "#009E73", "Cont. violin CI" = "#CC79A7",
  "Disc. violin CI" = "#E69F00")
```

```{r, cache = TRUE}
sumr <- d %>% group_by(viz, p) %>%
  summarise(Estimate = mean(value), 
    Q2.5 = quantile(value, 0.025), 
    Q97.5 = quantile(value, 0.975)) %>%
  mutate(p = as.numeric(levels(p))[p])

x_ticks <- c(0.001, 0.01, 0.04, 0.06, 0.1, 0.5, 0.8)
y_ticks <- c(0.05, seq(0.1, 0.9, by = 0.1), 0.95)
dodge <- 0.19
p1 <- sumr %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values = cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, 
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5)))


p2 <- sumr %>% filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values = cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p <- p1 + coord_cartesian(xlim = c(0.001, 0.9), ylim = c(0.045, 0.95)) + 
  annotation_custom(
    ggplotGrob(p2), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))
p

```
```{r, echo = FALSE, eval = FALSE}
ggsave(p, filename = "experiment2/results/exp2_confidence.pdf", 
  width = 2*8.5, height = 12.5, 
  unit = "cm", device = "pdf")
```


And the probability of extreme answer:
```{r extreme_exp2_plot, cache=TRUE}

f_zoi_exp2 <- fitted(fit_exp2, newdata = comb_exp2, re_formula = NA, dpar = "zoi")
df_01_exp2 <- data.frame(
  p = plogis(comb_exp2$logit_p),
  viz = comb_exp2$viz,
  f_zoi_exp2)
levels(df_01_exp2$viz) <- levels(d$viz)
y_ticks <- c(0.001, 0.01, seq(0.1,0.9,by=0.2))

p <- df_01_exp2 %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) +
  geom_linerange(aes(ymin = Q2.5, ymax = Q97.5),
    position = position_dodge(width=0.19)) +
  geom_line(alpha=0.5, position = position_dodge(width=0.19))  +
  ylab("Probability of all-or-none answer") + xlab("p-value") +
  scale_color_manual("Representation", values = cols) + 
  theme_classic() +
  scale_y_continuous(trans = "logit",
    breaks = y_ticks, labels = y_ticks, minor_breaks = NULL) +
  scale_x_continuous(trans = "logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 10), legend.position = "bottom",
    axis.title.x = element_text(size = 12),
    axis.text.y = element_text(size = 10), axis.title.y = element_text(size = 12),
    legend.text=element_text(size = 10), strip.text.x = element_text(size = 10))
p

```


Again we can compute the average drop in perceived confidence when moving from $p = 0.04$ to $p=0.06$:

```{r drop2, cache=TRUE}
d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  summarise(mean = mean(difference), sd = sd(difference),
    "2.5%" = quantile(difference, 0.025), 
    "97.5" = quantile(difference, 0.975))
```

There is a peculiar rise in confidence level in case of continuous Violin CI when the underlying $p$-value is 0.05, but overall, compared to the first experiment the results here do not show strong differences in cliff effect or dichotomous thinking, and actually is no clear signs of these phenomena in this experiment:

```{r, cache=TRUE}
p <- d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>% 
  ggplot(aes(x = difference, fill = viz, colour = viz)) + 
  geom_density(bw = 0.01, alpha = 0.6) +
  theme_classic() + 
  scale_fill_manual("Representation", values = cols) + 
  scale_colour_manual("Representation", values = cols) + 
  ylab("Posterior density") + 
  xlab("E[confidence(p=0.04) - confidence(p=0.06)]") +
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14),
    legend.text = element_text(size = 14)) 
p 

```

```{r, echo = FALSE, eval = FALSE}
ggsave(p, filename = "experiment2/results/exp2_cliff.pdf", 
  width = 2*8.5, height = 6.5, 
  unit = "cm", device = "pdf")
```

```{r, cache=TRUE}
postprob <- d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  group_by(iter) %>% 
  mutate(ci_vs_gradient = difference[viz == "Classic CI"] - 
      difference[viz == "Gradient CI"],
    ci_vs_cviolin = difference[viz == "Classic CI"] - 
      difference[viz == "Cont. violin CI"],
    ci_vs_dviolin = difference[viz == "Classic CI"] - 
      difference[viz == "Disc. violin CI"],
    gradient_vs_cviolin = difference[viz == "Gradient CI"] - 
      difference[viz == "Cont. violin CI"],
    gradient_vs_dviolin = difference[viz == "Gradient CI"] - 
      difference[viz == "Disc. violin CI"],
    cviolin_vs_dviolin = difference[viz == "Cont. violin CI"] - 
      difference[viz == "Disc. violin CI"]) %>%
  ungroup() %>% summarise(
    "P(CI > gradient)" = mean(ci_vs_gradient > 0),
    "P(CI > cviolin)" = mean(ci_vs_cviolin > 0),
    "P(CI > dviolin)" = mean(ci_vs_dviolin > 0),
    "P(gradient > cont violin)" = mean(gradient_vs_cviolin > 0),
    "P(gradient > disc violin)" = mean(gradient_vs_dviolin > 0),
    "P(cont violin > disc violin)" = mean(cviolin_vs_dviolin > 0))
round(t(as.data.frame(postprob)), 2)
```

### Results for the model with expertise

Now we consider expanded model with with expertise as predictor:

```{r, eval = FALSE, cache = TRUE}
fit_expertise <- brm(bf(
  confidence ~ 
    expertise * viz * p_lt0.05 * logit_p + 
    expertise * viz * p_eq0.05 +
    (viz + p_lt0.05 * logit_p + p_eq0.05 | id),
  zoi ~ 
    expertise * viz + viz * true_p + (viz | id),
  coi ~ mo(cat_p),
  sigma ~ expertise * viz + (1 | id)),
  data = data2,
  family = logit_p_gaussian,
  stanvars = stanvar(scode = stan_funs, block = "functions"),
  chains = 4, cores = 4, iter = 2000, init = 0, 
  save_warmup = FALSE, save_all_pars = TRUE, refresh = 0)
```
```{r, eval = TRUE, cache = TRUE, echo = FALSE}
fit_expertise <- readRDS("experiment2/results/fit_expertise.rds")
```
```{r, cache = TRUE}
comb_exp2 <- fit_expertise$data %>% 
  data_grid(expertise, viz, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %>% 
  filter(interaction(expertise, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %in% 
      unique(interaction(fit_expertise$data$expertise, 
        fit_expertise$data$logit_p, fit_expertise$data$p_lt0.05, 
        fit_expertise$data$p_eq0.05, fit_expertise$data$cat_p, 
        fit_expertise$data$true_p)))

f_mu_exp2 <- posterior_epred(fit_expertise, newdata = comb_exp2, re_formula = NA)

d <- data.frame(value = c(f_mu_exp2), 
  p = rep(comb_exp2$true_p, each = nrow(f_mu_exp2)),
  viz = rep(comb_exp2$viz, each = nrow(f_mu_exp2)),
  expertise = rep(comb_exp2$expertise, each = nrow(f_mu_exp2)),
  iter = 1:nrow(f_mu_exp2))

levels(d$viz) <- c("Classic CI", "Gradient CI", "Cont. violin CI", "Disc. violin CI")
cols  <- c("Classic CI" = "#0072B2", 
  "Gradient CI" = "#009E73", "Cont. violin CI" = "#CC79A7",
  "Disc. violin CI" = "#E69F00")
```

Here are posterior curves for the four different groups:
```{r 'posterior_curves_exp2_expertise', cache = TRUE}
sumr <- d %>% group_by(viz, p, expertise) %>%
  summarise(Estimate = mean(value), 
    Q2.5 = quantile(value, 0.025), 
    Q97.5 = quantile(value, 0.975)) %>%
  mutate(p = as.numeric(levels(p))[p])

x_ticks <- c(0.001, 0.01, 0.04, 0.06, 0.1, 0.5, 0.8)
y_ticks <- c(0.05, seq(0.1, 0.9, by = 0.1), 0.95)
dodge <- 0.19

p11 <- sumr %>% filter(expertise == "Stats/ML") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, 
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.margin = margin(t = -0.1, b = 0, unit = "cm"),
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), 
    ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p21 <- sumr %>% filter(expertise == "Stats/ML") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

yrange <- c(min(sumr$Q2.5)-0.001, max(sumr$Q97.5) +0.001)
p1 <- p11 + coord_cartesian(xlim = c(0.001, 0.9), 
  ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p21), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))

p12 <- sumr %>% filter(expertise == "VIS/HCI") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p22 <- sumr %>% filter(expertise == "VIS/HCI") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
   scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p2 <- p12 + coord_cartesian(xlim = c(0.001, 0.9), ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p22), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))

p13 <- sumr %>% filter(expertise == "Social science and humanities") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p23 <- sumr %>% filter(expertise == "Social science and humanities") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p3 <- p13 + coord_cartesian(xlim = c(0.001, 0.9), ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p23), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))


p14 <- sumr %>% filter(expertise == "Physical and life sciences") %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, margin = margin(t = -0.1, r = 0, b = -0.3, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5))) 


p24 <- sumr %>% filter(expertise == "Physical and life sciences") %>% 
  filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p4 <- p14 + coord_cartesian(xlim = c(0.001, 0.9), ylim = yrange) + 
  annotation_custom(
    ggplotGrob(p24), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))


library(patchwork)
p <- (p1 + ggtitle("Stats/ML")) + (p2 + ggtitle("VIS/HCI")) + 
  (p3 + ggtitle("Social sciences and humanities")) + 
  (p4 + ggtitle("Physical and life sciences"))
p
```

And the expertise-specific cliff effects:
```{r, cache = TRUE}
d %>% group_by(expertise, viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  summarise(mean = mean(difference), sd = sd(difference),
    "2.5%" = quantile(difference, 0.025), 
    "97.5" = quantile(difference, 0.975))
```

As with the model without expertise, we see no clear signs of cliff effect here.

### Reanalysis with cleaned data

Finally we again test how does the results depend on those confidence curves which have clearly positive slope or large increases in confidence when $p$-value increases.

We use simple linear model with logit-transformed $p$-value and trimmed logit-transformed confidence, and check whether the corresponding coefficient is clearly positive (arbitrarily chose as 0.1), and in addition to this we remove those curves where the difference between two consecutive confidence values are larger than 0.2 (this should of course be negative):

```{r cleaned2, cache = TRUE, eval = FALSE}
data <- readRDS("experiment2/data/exp2_data.rds")
outliers_slope <- data %>% group_by(id, viz) %>%
  mutate(p_logit = qlogis(p), c_logit = qlogis(
    ifelse(confidence == 0, 0.001, ifelse(confidence == 1, 0.999, confidence)))) %>%
  summarize(
    slope = coef(lm(c_logit ~ p_logit))[2]) %>%
  filter(slope > 0.1)

outliers_diff <- data %>% group_by(id, viz) %>% 
  arrange(p, .by_group = TRUE) %>%
  mutate(diff = c(0,diff(confidence)), neg_diff = any(diff > 0.2)) %>%
  filter(neg_diff)

data_cleaned <- data %>%
  filter(!(interaction(id,viz) %in% 
      interaction(outliers_slope$id, outliers_slope$viz))) %>%
  filter(!(interaction(id,viz) %in% 
      interaction(outliers_diff$id, outliers_diff$viz))) 

data_cleaned <- data_cleaned %>% 
  mutate(
    logit_p = qlogis(p),
    p_lt0.05 = factor(p < 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    p_eq0.05 = factor(p == 0.05, levels = c(TRUE, FALSE), labels = c("Yes", "No")),
    cat_p = recode_factor(true_p, 
      "0.06" = ">0.05", "0.1" = ">0.05", "0.5" = ">0.05", "0.8" = ">0.05",
      .ordered = TRUE))

fit_exp2 <- brm(bf(
  confidence ~ 
    viz * p_lt0.05 * logit_p + 
    viz * p_eq0.05 +
    (viz + p_lt0.05 * logit_p + p_eq0.05 | id),
  zoi ~ 
    viz * true_p + (viz | id),
  coi ~ mo(cat_p),
  sigma ~ viz + (1 | id)),
  data = data_cleaned,
  family = logit_p_gaussian,
  stanvars = stanvar(scode = stan_funs, block = "functions"),
  chains = 4, cores = 4, iter = 2000, init = 0, 
  save_warmup = FALSE, refresh = 0)
```
```{r, echo = FALSE}
fit_exp2 <- readRDS("experiment2/results/fit_cleaned.rds")
```

```{r cleaned_curves2, cache = TRUE}
comb_exp2 <- fit_exp2$data %>% 
  data_grid(viz, logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %>% 
  filter(interaction(logit_p, p_lt0.05, p_eq0.05, cat_p, true_p) %in% 
      unique(interaction( 
        fit_exp2$data$logit_p, fit_exp2$data$p_lt0.05, 
        fit_exp2$data$p_eq0.05, fit_exp2$data$cat_p, 
        fit_exp2$data$true_p)))

f_mu_exp2 <- posterior_epred(fit_exp2, newdata = comb_exp2, re_formula = NA)

d <- data.frame(value = c(f_mu_exp2), 
  p = rep(comb_exp2$true_p, each = nrow(f_mu_exp2)),
  viz = rep(comb_exp2$viz, each = nrow(f_mu_exp2)),
  iter = 1:nrow(f_mu_exp2))

levels(d$viz) <- c("Classic CI", "Gradient CI", "Cont. violin CI", "Disc. violin CI")

sumr <- d %>% group_by(viz, p) %>%
  summarise(Estimate = mean(value), 
    Q2.5 = quantile(value, 0.025), 
    Q97.5 = quantile(value, 0.975)) %>%
  mutate(p = as.numeric(levels(p))[p])

cols  <- c("Classic CI" = "#0072B2", 
  "Gradient CI" = "#009E73", "Cont. violin CI" = "#CC79A7",
  "Disc. violin CI" = "#E69F00")
x_ticks <- c(0.001, 0.01, 0.04, 0.06, 0.1, 0.5, 0.8)
y_ticks <- c(0.05, seq(0.1, 0.9, by = 0.1), 0.95)
dodge <- 0.19

p1 <- sumr %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(dodge), size = 0.1) +
  geom_linerange(data = sumr %>% filter(p < 0.03 | p > 0.07),
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(dodge), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(dodge), size = 0.7, show.legend = FALSE) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks, 
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = x_ticks, labels = x_ticks, minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14, 
      margin = margin(t = -0.1, r = 0, b = -0.1, l = 0, unit = "cm")),
    axis.title.y = element_text(size = 14, 
      margin = margin(t = 0, r = -0.1, b = 0, l = -0.1, unit = "cm")),
    legend.text = element_text(size = 14))  + 
  geom_rect(xmin = qlogis(0.03), xmax = qlogis(0.07), ymin = qlogis(0.31), ymax = qlogis(0.82), 
    color = "grey70", alpha = 0, linetype = "dashed", size = 0.1) + 
  guides(colour = guide_legend(override.aes = list(size = 1.5)))


p2 <- sumr %>% filter(p > 0.02 & p < 0.09) %>%
  ggplot(aes(x = p, y = Estimate, colour = viz)) + 
  geom_line(position = position_dodge(0.1), size = 0.1) +
  geom_linerange(
    aes(ymin = Q2.5, ymax = Q97.5), 
    position = position_dodge(0.1), size = 0.3,
    show.legend = FALSE) + 
  geom_point(position = position_dodge(0.1), size = 0.7) +
  ylab("Confidence") + xlab("p-value") + 
  scale_color_manual("Representation", values =  cols) + 
  scale_y_continuous(trans="logit", breaks = y_ticks,
    minor_breaks = NULL, labels = y_ticks) + 
  scale_x_continuous(trans="logit",
    breaks = c(0.04, 0.05, 0.06), 
    labels = c(0.04, 0.05, 0.06), 
    minor_breaks = NULL) + 
  theme_classic() + 
  theme(legend.position = "none",  
    axis.title.x = element_blank(), axis.title.y = element_blank(),
    plot.background = element_blank(),
    plot.margin=unit(c(-4,-9,0,0), "mm"),
    axis.text.x = element_text(size = 12), 
    axis.text.y = element_text(size = 12)) 

p <- p1 + coord_cartesian(xlim = c(0.001, 0.9), ylim = c(0.045, 0.95)) + 
  annotation_custom(
    ggplotGrob(p2), 
    xmin = qlogis(0.2), xmax = qlogis(0.9), ymin = qlogis(0.3), ymax = qlogis(0.95))
p
```
```{r, cache=TRUE}
d %>% group_by(viz, iter) %>% 
  summarise(difference = value[p == "0.04"] - value[p == "0.06"]) %>%
  summarise(mean = mean(difference), sd = sd(difference),
    "2.5%" = quantile(difference, 0.025), 
    "97.5" = quantile(difference, 0.975))
```

Overall, the results are very similar to the analysis with the full data.
    
    
### Subjective rankings for second experiment

Read the data:
```{r create_rankdata_exp2, eval = FALSE, cache=TRUE}
path <- "experiment2/data"
files <- list.files(path, pattern = "subjective", full.names = TRUE)
n <- length(files)
rankdata2 <- data.frame(id = rep(1:n, each = 4),
  viz = factor(rep(c("violin2", "ci", "violin", "gradient")), 
    levels = c("violin2", "ci", "violin", "gradient")),
  rank = factor(NA, levels = 1:4))
for(i in 1:n) {
  fb <- fromJSON(files[i])
  rankdata2$id[4*(i-1) + 1:4] <- strsplit(strsplit(files[i], "subjective")[[1]], ".txt")[[2]]
  rankdata2$rank[4*(i-1) + 1:4] <- factor(fb$rank)
}
rankdata2$viz <- recode_factor(rankdata2$viz, "ci" = "CI", 
  "gradient" = "Gradient", 
  "violin" = "Continuous Violin", 
  "violin2" = "Discrete Violin")
rankdata2$viz <- relevel(rankdata2$viz, "CI")
rankdata2$rank <- factor(rankdata2$rank, ordered = TRUE)
rankdata2$id <- factor(rankdata2$id, levels = levels(data2$id))
ranks_exp2 <- distinct(inner_join(rankdata2, data2[, c("id", "viz", "expertise")]))
```
```{r, echo=FALSE, cache = TRUE}
ranks_exp2 <- readRDS("experiment2/data/exp2_rankdata.rds")
fit_rank2 <- readRDS("experiment2/results/fit_rank2.rds")
```

And fit the same model as in the first experiment:
```{r fit_rankmodel_exp2, cache = TRUE, eval = FALSE}
fit_rank2 <- brm(rank ~ viz + (1 | id), family = cumulative, 
  data = ranks_exp2, refresh = 0)
saveRDS(fit_rank2, file = "experiment2/results/fit_rank2.rds")
```

The ranking probabilities:
```{r rankmodel_exp2_plot, cache=TRUE}
effects_exp2 <- conditional_effects(fit_rank2, effects = "viz", 
  plot = FALSE, categorical = TRUE, 
  reformula=NA)

p <- ggplot(effects_exp2[[1]], aes(x = viz, y = estimate__, colour = cats__)) + 
  geom_point(position=position_dodge(0.5)) + 
  geom_errorbar(width=0.25, aes(ymin=lower__, ymax = upper__), 
    position = position_dodge(0.5)) + 
  theme_classic() + 
  ylab("Ranking \n probability") + xlab("Representation") +
  scale_x_discrete(labels = 
      c("Classic CI", "Gradient CI", "Cont. violin CI", "Disc. violin CI")) +
  scale_color_manual("Rank", 
    values = colsrank,
    labels = c("1 (best)", "2", "3", "4 (worst)")) + 
  theme(legend.position = "bottom", 
    legend.title = element_blank(),
    axis.text.x = element_text(size = 12,hjust = 1, vjust = 1), 
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(size = 14),
    axis.title.y = element_text(size = 14),
    legend.text = element_text(size = 14)) 
p
```
```{r, eval = FALSE, echo = FALSE}
ggsave(p, file="experiment2/results/ranks2.pdf",  
  width = 2*8.5, height = 6, 
  unit = "cm", device = "pdf")
```

Preferences between different techniques seem to be quite similar, except there seems to be preferences towards discrete violin CI plot.