Merck · LittleBeannie · Aug 26, 2024 · Jul 30, 2024 · Jul 30, 2024 · Jul 30, 2024
diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
@@ -0,0 +1,262 @@
+---
+title: "Correlation Matrix Calculation"
+author: "Chenguang Zhang"
+date: "2024-05-14"
+output: html_document
+---
+
+The weighted parametric group sequential design (WPGSD) (Anderson et al. (2022)) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. 
+
+## Notation
+
+Suppose that in a group sequential trial there are $m$ elementary null hypotheses $H_i$, $i \in I={1,...,m}$, and there are $K$ analyses. Let $k$ be the index for the interim analyses and final analyses, $k=1,2,...K$. For any nonempty set $J \subseteq I$, we denote the intersection hypothesis $H_J=\cap_{j \in J}H_j$. We note that $H_I$ is the global null hypothesis.
+
+We assume the plan is for all hypotheses to be tested at each of the $k$ planned analyses if the trial continues to the end for all hypotheses. We further assume that the distribution of the $m \times K$ tests of $m$ individual hypotheses at all $k$ analyses is multivariate normal with a completely known correlation matrix. 
+
+Let $Z_{ik}$ be the standardized normal test statistic for hypothesis $i \in I$, analysis $1 \le k \le K$. Let $n_{ik}$ be the number of events collected cumulatively through stage $k$ for hypothesis $i$. Then $n_{i \wedge i',k \wedge k'}$ is the number of events included in both $Z_{ik}$ and $i$, $i' \in I$, $1 \le k$, $k' \le K$. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between $Z_{ik}$ and $Z_{i'k'}$ is
+$$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}$$. 
+
+## Examples
+
+In a 2-arm controlled clinical trial example with one primary endpoint, there are 3 patient populations defined by the status of two biomarkers A and B:
+
+* Biomarker A positive, the population 1,
+* Biomarker B positive, the population 2,
+* Overall population.
+
+The 3 primary elementary hypotheses are:
+
+* H1: the experimental treatment is superior to the control in the population 1
+* H2: the experimental treatment is superior to the control in the population 2
+* H3: the experimental treatment is superior to the control in the overall population
+
+Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as
+```{r}
+library(dplyr)
+library(tibble)
+library(gt)
+event_tb <- tribble(
+  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
+  "Population 1", 100, 200,
+  "Population 2", 110, 220,
+  "Overlap of Population 1 and 2", 80, 160,
+  "Overall Population", 225, 450
+)
+event_tb %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population")
+```
+
+### Example 1 - Same Analyses Different Population
+Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. 
+The correlation matrix will be
+$$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$
+The number of events are listed as
+```{r}
+event_tbl <- tribble(
+  ~Population, ~"Number of Event in IA",
+  "Population 1", 100,
+  "Population 2", 110,
+  "Overlap in population 1 and 2", 80
+)
+event_tbl %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population in example 1")
+```
+The the corrleation could be simply calculated as 
+$$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$
+```{r}
+Corr1 <- 80 / sqrt(100 * 110)
+round(Corr1, 2)
+```
+
+### Example 2 - Same Population Different Analyses
+Let's consider another simple situation, we want to compare single population, for example population 1, but in different analyses, interim and final analyses. Then  $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
+The correlation matrix will be
+$$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$
+The number of events are listed as
+```{r}
+event_tb2 <- tribble(
+  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
+  "Population 1", 100, 200
+)
+event_tb2 %>%
+  gt() %>%
+  tab_header(title = "Number of events at each analyses in example 2")
+```
+The the corrleation could be simply calculated as 
+$$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
+```{r}
+Corr1 <- 100 / sqrt(100 * 200)
+round(Corr1, 2)
+```
+### Example 3 - Cross Population Cross Analyses
+Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
+The correlation matrix will be
+$$Corr(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$
+The number of events are listed as
+```{r}
+event_tb3 <- tribble(
+  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
+  "Population 1", 100, 200,
+  "Population 2", 110, 220,
+  "Overlap in population 1 and 2", 80, 160
+)
+event_tb3 %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population & analyses in example 3")
+```
+The the corrleation could be simply calculated as 
+$$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
+```{r}
+Corr1 <- 80 / sqrt(100 * 220)
+round(Corr1, 2)
+```
+Now we know how to calculate the correlation values under different situations, and the generate_corr function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
+
+First, we need a event table including the information of the cohort.
+
+
+```{r}
+library(wpgsd)
+# The event table
+event <- tibble::tribble(
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
+event %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population & analyses")
+```
+"H1" indicates that the experimental treatment is superior to the control in population 1/experimental arm 1. "H2" indicates that the experimental treatment is superior to the control in population 2/experimental arm 2. "Analysis" refers to different stages of analysis, such as 1 for interim analysis and 2 for final analysis. "Event" represents the number of events in this condition.
+
+For example: H1=1, H2=1, Analysis=1, Event=100 indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis.
+
+Another example: H1=1, H2=2, Analysis=2, Event=160 indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.
+
+*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.                                                                                                                                                                                                                                                                             
+After we have the event table, we can use generate_corr function to calculate correlation.
+
+```{r}
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+all_corr
+```
+* P1/P2: Population 1/2; IA: Interim analysis; FA: Final analysis
+
+### Some situations could be considered:
+Situation 1: The number of events in one of the population is extremely small.
+
+For example, the number of events in population 1 is very small. 
+
+The code will still give you the results
+
+```{r}
+event <- tibble::tribble(
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 5,
+  2, 2, 1, 1100,
+  3, 3, 1, 2250,
+  1, 2, 1, 4,
+  1, 3, 1, 2,
+  2, 3, 1, 1100,
+  1, 1, 2, 8,
+  2, 2, 2, 2200,
+  3, 3, 2, 4500,
+  1, 2, 2, 6,
+  1, 3, 2, 7,
+  2, 3, 2, 2200
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+all_corr
+```
+
+Situation 2: The overlap between population 1&2 is 0
+
+The code will still give you results but with some correlations are 0
+
+```{r}
+event <- tibble::tribble(
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 0,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 0,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+all_corr
+```
+
+Situation 3-1: The number of events number mistakenly been recorded as negative
+
+The warning message will be displayed, and NA's have been generated.
+```{r}
+event <- tibble::tribble(
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, -100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, -200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+all_corr
+```
+
+Situation 3-2: The number of overlap events number mistakenly been recorded as negative
+
+No warning or error message generated. But the correlation could be negative, which is misleading information. Please be careful and check data before go to the next step.
+```{r}
+event <- tibble::tribble(
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, -80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, -160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+all_corr
+```