From 798f70987a570ed01782af0fc56a5a17d3074152 Mon Sep 17 00:00:00 2001
From: guangguangzai <cgzhang19@gmail.com>
Date: Tue, 30 Jul 2024 15:18:30 -0400
Subject: [PATCH 1/8] add wpgsd correlation example file

---
 vignettes/wpgsd_corr_example.Rmd | 263 +++++++++++++++++++++++++++++++
 1 file changed, 263 insertions(+)
 create mode 100644 vignettes/wpgsd_corr_example.Rmd

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
new file mode 100644
index 0000000..3fd22f0
--- /dev/null
+++ b/vignettes/wpgsd_corr_example.Rmd
@@ -0,0 +1,263 @@
+---
+title: "Correlation Matrix Calculation"
+author: "Chenguang Zhang"
+date: "2024-05-14"
+output: html_document
+---
+
+The weighted parametric group sequential design (WPGSD) (Anderson et al. (2022)) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. 
+
+## Notation
+
+Suppose that in a group sequential trial there are $m$ elementary null hypotheses $H_i$, $i \in I={1,...,m}$, and there are $K$ analyses. Let $k$ be the index for the interim analyses and final analyses, $k=1,2,...K$. For any nonempty set $J \subseteq I$, we denote the intersection hypothesis $H_J=\cap_{j \in J}H_j$. We note that $H_I$ is the global null hypothesis.
+
+We assume the plan is for all hypotheses to be tested at each of the $k$ planned analyses if the trial continues to the end for all hypotheses. We further assume that the distribution of the $m \times K$ tests of $m$ individual hypotheses at all $k$ analyses is multivariate normal with a completely known correlation matrix. 
+
+Let $Z_{ik}$ be the standardized normal test statistic for hypothesis $i \in I$, analysis $1 \le k \le K$. Let $n_{ik}$ be the number of events collected cumulatively through stage $k$ for hypothesis $i$. Then $n_{i \wedge i',k \wedge k'}$ is the number of events included in both $Z_{ik}$ and $i$, $i' \in I$, $1 \le k$, $k' \le K$. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between $Z_{ik}$ and $Z_{i'k'}$ is
+$$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}$$. 
+
+## Examples
+
+In a 2-arm controlled clinical trial example with one primary endpoint, there are 3 patient populations defined by the status of two biomarkers A and B:
+
+* Biomarker A positive, the population 1,
+* Biomarker B positive, the population 2,
+* Overall population.
+
+The 3 primary elementary hypotheses are:
+
+* H1: the experimental treatment is superior to the control in the population 1
+* H2: the experimental treatment is superior to the control in the population 2
+* H3: the experimental treatment is superior to the control in the overall population
+  
+Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as
+```{r}
+library(dplyr)
+library(tibble)
+library(gt)
+event_tb <- tribble(
+  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
+  "Population 1", 100,200,
+  "Population 2",  110,220,
+  "Overlap of Population 1 and 2", 80,160,
+  "Overall Population", 225, 450
+)
+event_tb %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population")
+```
+
+### Example 1 - Same Analyses Different Population
+Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. 
+The correlation matrix will be
+$$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$
+The number of events are listed as
+```{r}
+event_tbl <- tribble(
+  ~Population, ~"Number of Event in IA",
+  "Population 1", 100,
+  "Population 2",  110,
+  "Overlap in population 1 and 2", 80
+)
+event_tbl %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population in example 1")
+```
+The the corrleation could be simply calculated as 
+$$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$
+```{r}
+Corr1=80/sqrt(100*110)
+round(Corr1,2)
+```
+
+### Example 2 - Same Population Different Analyses
+Let's consider another simple situation, we want to compare single population, for example population 1, but in different analyses, interim and final analyses. Then  $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
+The correlation matrix will be
+$$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$
+The number of events are listed as
+```{r}
+event_tb2 <- tribble(
+  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
+  "Population 1", 100,200
+)
+event_tb2 %>%
+  gt() %>%
+  tab_header(title = "Number of events at each analyses in example 2")
+```
+The the corrleation could be simply calculated as 
+$$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
+```{r}
+Corr1=100/sqrt(100*200)
+round(Corr1,2)
+```
+### Example 3 - Cross Population Cross Analyses
+Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
+The correlation matrix will be
+$$Corr(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$
+The number of events are listed as
+```{r}
+event_tb3 <- tribble(
+  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
+  "Population 1", 100,200,
+  "Population 2", 110, 220,
+  "Overlap in population 1 and 2", 80,160
+
+)
+event_tb3 %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population & analyses in example 3")
+```
+The the corrleation could be simply calculated as 
+$$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
+```{r}
+Corr1=80/sqrt(100*220)
+round(Corr1,2)
+```
+Now we know how to calculate the correlation values under different situations, and the generate_corr function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
+
+First, we need a event table including the information of the cohort.
+
+
+```{r}
+library(wpgsd)
+#The event table
+event <- tibble::tribble(
+   ~ H1, ~H2, ~Analysis, ~Event,
+   1, 1, 1, 100,
+   2, 2, 1, 110,
+   3, 3, 1, 225,
+   1, 2, 1, 80,
+   1, 3, 1, 100,
+   2, 3, 1, 110,
+   1, 1, 2, 200,
+   2, 2, 2, 220,
+   3, 3, 2, 450,
+   1, 2, 2, 160,
+   1, 3, 2, 200,
+   2, 3, 2, 220
+ )
+event %>%
+  gt() %>%
+  tab_header(title = "Number of events at each population & analyses")
+```
+"H1" means the experimental treatment is superior to the control in the population 1/experimental arm 1; "H2" means the experimental treatment is superior to the control in the population 2/experimental arm 2; "Analysis" means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis; and the "Event" means the number of events in this condition. 
+
+For example: H1=1, H2=1, Analysis=1, Event=100 means in the first population, there are 100 cases of experimental treatment is superior to the control in the interim analysis.
+
+Another example: H1=1, H2=2, Analysis=2, Event=160 means the overlap number of experimental treatment superior to the control in population 1 and 2 in the final analysis is 160.
+
+*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.                                                                                                                                                                                                                                                                                                                                                                                                                                                          
+After we have the event table, we can use generate_corr function to calculate correlation.
+
+```{r}
+all_corr=round(generate_corr(event),2)
+colnames(all_corr)=c("P1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+rownames(all_corr)=c("P1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+all_corr 
+```
+* P1/P2: Population 1/2; IA: Interim analysis; FA: Final analysis
+
+### Some situations could be considered:
+Situation 1: The number of events in one of the population is extremely small.
+
+For example, the number of events in population 1 is very small. 
+
+The code will still give you the results
+
+```{r}
+event <- tibble::tribble(
+   ~H1, ~H2, ~Analysis, ~Event,
+   1, 1, 1, 5,
+   2, 2, 1, 1100,
+   3, 3, 1, 2250,
+   1, 2, 1, 4,
+   1, 3, 1, 2,
+   2, 3, 1, 1100,
+   1, 1, 2, 8,
+   2, 2, 2, 2200,
+   3, 3, 2, 4500,
+   1, 2, 2, 6,
+   1, 3, 2, 7,
+   2, 3, 2, 2200
+ )
+all_corr=round(generate_corr(event),2)
+colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+all_corr
+```
+
+Situation 2: The overlap between population 1&2 is 0
+
+The code will still give you results but with some correlations are 0
+
+```{r}
+event <- tibble::tribble(
+   ~H1, ~H2, ~Analysis, ~Event,
+   1, 1, 1, 100,
+   2, 2, 1, 110,
+   3, 3, 1, 225,
+   1, 2, 1, 0,
+   1, 3, 1, 100,
+   2, 3, 1, 110,
+   1, 1, 2, 200,
+   2, 2, 2, 220,
+   3, 3, 2, 450,
+   1, 2, 2, 0,
+   1, 3, 2, 200,
+   2, 3, 2, 220
+ )
+all_corr=round(generate_corr(event),2)
+colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+all_corr
+```
+
+Situation 3-1: The number of events number mistakenly been recorded as negative
+
+The warning message will be displayed, and NA's have been generated.
+```{r}
+event <- tibble::tribble(
+   ~H1, ~H2, ~Analysis, ~Event,
+   1, 1, 1, -100,
+   2, 2, 1, 110,
+   3, 3, 1, 225,
+   1, 2, 1, 80,
+   1, 3, 1, 100,
+   2, 3, 1, 110,
+   1, 1, 2, -200,
+   2, 2, 2, 220,
+   3, 3, 2, 450,
+   1, 2, 2, 160,
+   1, 3, 2, 200,
+   2, 3, 2, 220
+ )
+all_corr=round(generate_corr(event),2)
+colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+all_corr
+```
+
+Situation 3-2: The number of overlap events number mistakenly been recorded as negative
+
+No warning or error message generated. But the correlation could be negative, which is misleading information. Please be careful and check data before go to the next step.
+```{r}
+event <- tibble::tribble(
+   ~H1, ~H2, ~Analysis, ~Event,
+   1, 1, 1, 100,
+   2, 2, 1, 110,
+   3, 3, 1, 225,
+   1, 2, 1, -80,
+   1, 3, 1, 100,
+   2, 3, 1, 110,
+   1, 1, 2, 200,
+   2, 2, 2, 220,
+   3, 3, 2, 450,
+   1, 2, 2, -160,
+   1, 3, 2, 200,
+   2, 3, 2, 220
+ )
+all_corr=round(generate_corr(event),2)
+colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+all_corr
+```
\ No newline at end of file

From 2f10d16809169cb3f2fcb6cd8d6ac44c139d4928 Mon Sep 17 00:00:00 2001
From: guangguangzai <guangguangzai@users.noreply.github.com>
Date: Tue, 30 Jul 2024 19:22:00 +0000
Subject: [PATCH 2/8] Style code (GHA)

---
 vignettes/wpgsd_corr_example.Rmd | 203 +++++++++++++++----------------
 1 file changed, 101 insertions(+), 102 deletions(-)

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
index 3fd22f0..22771ba 100644
--- a/vignettes/wpgsd_corr_example.Rmd
+++ b/vignettes/wpgsd_corr_example.Rmd
@@ -37,9 +37,9 @@ library(tibble)
 library(gt)
 event_tb <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100,200,
-  "Population 2",  110,220,
-  "Overlap of Population 1 and 2", 80,160,
+  "Population 1", 100, 200,
+  "Population 2", 110, 220,
+  "Overlap of Population 1 and 2", 80, 160,
   "Overall Population", 225, 450
 )
 event_tb %>%
@@ -56,7 +56,7 @@ The number of events are listed as
 event_tbl <- tribble(
   ~Population, ~"Number of Event in IA",
   "Population 1", 100,
-  "Population 2",  110,
+  "Population 2", 110,
   "Overlap in population 1 and 2", 80
 )
 event_tbl %>%
@@ -66,8 +66,8 @@ event_tbl %>%
 The the corrleation could be simply calculated as 
 $$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$
 ```{r}
-Corr1=80/sqrt(100*110)
-round(Corr1,2)
+Corr1 <- 80 / sqrt(100 * 110)
+round(Corr1, 2)
 ```
 
 ### Example 2 - Same Population Different Analyses
@@ -78,7 +78,7 @@ The number of events are listed as
 ```{r}
 event_tb2 <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100,200
+  "Population 1", 100, 200
 )
 event_tb2 %>%
   gt() %>%
@@ -87,8 +87,8 @@ event_tb2 %>%
 The the corrleation could be simply calculated as 
 $$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
 ```{r}
-Corr1=100/sqrt(100*200)
-round(Corr1,2)
+Corr1 <- 100 / sqrt(100 * 200)
+round(Corr1, 2)
 ```
 ### Example 3 - Cross Population Cross Analyses
 Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
@@ -98,10 +98,9 @@ The number of events are listed as
 ```{r}
 event_tb3 <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100,200,
+  "Population 1", 100, 200,
   "Population 2", 110, 220,
-  "Overlap in population 1 and 2", 80,160
-
+  "Overlap in population 1 and 2", 80, 160
 )
 event_tb3 %>%
   gt() %>%
@@ -110,8 +109,8 @@ event_tb3 %>%
 The the corrleation could be simply calculated as 
 $$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
 ```{r}
-Corr1=80/sqrt(100*220)
-round(Corr1,2)
+Corr1 <- 80 / sqrt(100 * 220)
+round(Corr1, 2)
 ```
 Now we know how to calculate the correlation values under different situations, and the generate_corr function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
 
@@ -120,22 +119,22 @@ First, we need a event table including the information of the cohort.
 
 ```{r}
 library(wpgsd)
-#The event table
+# The event table
 event <- tibble::tribble(
-   ~ H1, ~H2, ~Analysis, ~Event,
-   1, 1, 1, 100,
-   2, 2, 1, 110,
-   3, 3, 1, 225,
-   1, 2, 1, 80,
-   1, 3, 1, 100,
-   2, 3, 1, 110,
-   1, 1, 2, 200,
-   2, 2, 2, 220,
-   3, 3, 2, 450,
-   1, 2, 2, 160,
-   1, 3, 2, 200,
-   2, 3, 2, 220
- )
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
 event %>%
   gt() %>%
   tab_header(title = "Number of events at each population & analyses")
@@ -150,10 +149,10 @@ Another example: H1=1, H2=2, Analysis=2, Event=160 means the overlap number of e
 After we have the event table, we can use generate_corr function to calculate correlation.
 
 ```{r}
-all_corr=round(generate_corr(event),2)
-colnames(all_corr)=c("P1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
-rownames(all_corr)=c("P1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
-all_corr 
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+all_corr
 ```
 * P1/P2: Population 1/2; IA: Interim analysis; FA: Final analysis
 
@@ -166,23 +165,23 @@ The code will still give you the results
 
 ```{r}
 event <- tibble::tribble(
-   ~H1, ~H2, ~Analysis, ~Event,
-   1, 1, 1, 5,
-   2, 2, 1, 1100,
-   3, 3, 1, 2250,
-   1, 2, 1, 4,
-   1, 3, 1, 2,
-   2, 3, 1, 1100,
-   1, 1, 2, 8,
-   2, 2, 2, 2200,
-   3, 3, 2, 4500,
-   1, 2, 2, 6,
-   1, 3, 2, 7,
-   2, 3, 2, 2200
- )
-all_corr=round(generate_corr(event),2)
-colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
-rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 5,
+  2, 2, 1, 1100,
+  3, 3, 1, 2250,
+  1, 2, 1, 4,
+  1, 3, 1, 2,
+  2, 3, 1, 1100,
+  1, 1, 2, 8,
+  2, 2, 2, 2200,
+  3, 3, 2, 4500,
+  1, 2, 2, 6,
+  1, 3, 2, 7,
+  2, 3, 2, 2200
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
 all_corr
 ```
 
@@ -192,23 +191,23 @@ The code will still give you results but with some correlations are 0
 
 ```{r}
 event <- tibble::tribble(
-   ~H1, ~H2, ~Analysis, ~Event,
-   1, 1, 1, 100,
-   2, 2, 1, 110,
-   3, 3, 1, 225,
-   1, 2, 1, 0,
-   1, 3, 1, 100,
-   2, 3, 1, 110,
-   1, 1, 2, 200,
-   2, 2, 2, 220,
-   3, 3, 2, 450,
-   1, 2, 2, 0,
-   1, 3, 2, 200,
-   2, 3, 2, 220
- )
-all_corr=round(generate_corr(event),2)
-colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
-rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 0,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 0,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
 all_corr
 ```
 
@@ -217,23 +216,23 @@ Situation 3-1: The number of events number mistakenly been recorded as negative
 The warning message will be displayed, and NA's have been generated.
 ```{r}
 event <- tibble::tribble(
-   ~H1, ~H2, ~Analysis, ~Event,
-   1, 1, 1, -100,
-   2, 2, 1, 110,
-   3, 3, 1, 225,
-   1, 2, 1, 80,
-   1, 3, 1, 100,
-   2, 3, 1, 110,
-   1, 1, 2, -200,
-   2, 2, 2, 220,
-   3, 3, 2, 450,
-   1, 2, 2, 160,
-   1, 3, 2, 200,
-   2, 3, 2, 220
- )
-all_corr=round(generate_corr(event),2)
-colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
-rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, -100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, -200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
 all_corr
 ```
 
@@ -242,22 +241,22 @@ Situation 3-2: The number of overlap events number mistakenly been recorded as n
 No warning or error message generated. But the correlation could be negative, which is misleading information. Please be careful and check data before go to the next step.
 ```{r}
 event <- tibble::tribble(
-   ~H1, ~H2, ~Analysis, ~Event,
-   1, 1, 1, 100,
-   2, 2, 1, 110,
-   3, 3, 1, 225,
-   1, 2, 1, -80,
-   1, 3, 1, 100,
-   2, 3, 1, 110,
-   1, 1, 2, 200,
-   2, 2, 2, 220,
-   3, 3, 2, 450,
-   1, 2, 2, -160,
-   1, 3, 2, 200,
-   2, 3, 2, 220
- )
-all_corr=round(generate_corr(event),2)
-colnames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
-rownames(all_corr)=c("Population 1, IA", "P2, IA", "P3, IA","P1, FA","P2, FA", "P3, FA")
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, -80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, -160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
+all_corr <- round(generate_corr(event), 2)
+colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
+rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
 all_corr
-```
\ No newline at end of file
+```

From a24a96c748decef2805839c6c587248a8fc2e904 Mon Sep 17 00:00:00 2001
From: guangguangzai <cgzhang19@gmail.com>
Date: Tue, 30 Jul 2024 15:46:36 -0400
Subject: [PATCH 3/8] add

---
 vignettes/wpgsd_corr_example.Rmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
index 22771ba..63efeec 100644
--- a/vignettes/wpgsd_corr_example.Rmd
+++ b/vignettes/wpgsd_corr_example.Rmd
@@ -139,13 +139,13 @@ event %>%
   gt() %>%
   tab_header(title = "Number of events at each population & analyses")
 ```
-"H1" means the experimental treatment is superior to the control in the population 1/experimental arm 1; "H2" means the experimental treatment is superior to the control in the population 2/experimental arm 2; "Analysis" means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis; and the "Event" means the number of events in this condition. 
+"H1" indicates that the experimental treatment is superior to the control in population 1/experimental arm 1. "H2" indicates that the experimental treatment is superior to the control in population 2/experimental arm 2. "Analysis" refers to different stages of analysis, such as 1 for interim analysis and 2 for final analysis. "Event" represents the number of events in this condition.
 
-For example: H1=1, H2=1, Analysis=1, Event=100 means in the first population, there are 100 cases of experimental treatment is superior to the control in the interim analysis.
+For example: H1=1, H2=1, Analysis=1, Event=100 indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis.
 
-Another example: H1=1, H2=2, Analysis=2, Event=160 means the overlap number of experimental treatment superior to the control in population 1 and 2 in the final analysis is 160.
+Another example: H1=1, H2=2, Analysis=2, Event=160 indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.
 
-*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.                                                                                                                                                                                                                                                                                                                                                                                                                                                          
+*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.                                                                                                                                                                                                                                                                             
 After we have the event table, we can use generate_corr function to calculate correlation.
 
 ```{r}

From 028e0a7232f55e28a4333360b7f652ebdb550454 Mon Sep 17 00:00:00 2001
From: guangguangzai <cgzhang19@gmail.com>
Date: Tue, 30 Jul 2024 15:46:36 -0400
Subject: [PATCH 4/8] Updated R markdown based on reviewer's comments 20Aug2024

---
 vignettes/wpgsd_corr_example.Rmd | 203 ++++++++-----------------------
 1 file changed, 50 insertions(+), 153 deletions(-)

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
index 22771ba..80c2641 100644
--- a/vignettes/wpgsd_corr_example.Rmd
+++ b/vignettes/wpgsd_corr_example.Rmd
@@ -3,9 +3,10 @@ title: "Correlation Matrix Calculation"
 author: "Chenguang Zhang"
 date: "2024-05-14"
 output: html_document
+bibliography: citations.bib
 ---
 
-The weighted parametric group sequential design (WPGSD) (Anderson et al. (2022)) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. 
+The weighted parametric group sequential design (WPGSD) (@anderson_unified_2022) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. 
 
 ## Notation
 
@@ -18,7 +19,7 @@ $$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}
 
 ## Examples
 
-In a 2-arm controlled clinical trial example with one primary endpoint, there are 3 patient populations defined by the status of two biomarkers A and B:
+In a 2-arm controlled clinical trial example with one primary endpoint (@anderson_unified_2022), there are 3 patient populations defined by the status of two biomarkers A and B :
 
 * Biomarker A positive, the population 1,
 * Biomarker B positive, the population 2,
@@ -31,14 +32,17 @@ The 3 primary elementary hypotheses are:
 * H3: the experimental treatment is superior to the control in the overall population
   
 Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as
-```{r}
+```{r,message=FALSE}
 library(dplyr)
 library(tibble)
 library(gt)
+```
+
+```{r}
 event_tb <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100, 200,
-  "Population 2", 110, 220,
+  "Population 1", 100,200,
+  "Population 2",  110,220,
   "Overlap of Population 1 and 2", 80, 160,
   "Overall Population", 225, 450
 )
@@ -47,7 +51,7 @@ event_tb %>%
   tab_header(title = "Number of events at each population")
 ```
 
-### Example 1 - Same Analyses Different Population
+### Example 1 - Correlation of different populations within the same analysis
 Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. 
 The correlation matrix will be
 $$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$
@@ -56,7 +60,7 @@ The number of events are listed as
 event_tbl <- tribble(
   ~Population, ~"Number of Event in IA",
   "Population 1", 100,
-  "Population 2", 110,
+  "Population 2",  110,
   "Overlap in population 1 and 2", 80
 )
 event_tbl %>%
@@ -66,19 +70,19 @@ event_tbl %>%
 The the corrleation could be simply calculated as 
 $$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$
 ```{r}
-Corr1 <- 80 / sqrt(100 * 110)
-round(Corr1, 2)
+Corr1=80/sqrt(100*110)
+round(Corr1,2)
 ```
 
-### Example 2 - Same Population Different Analyses
-Let's consider another simple situation, we want to compare single population, for example population 1, but in different analyses, interim and final analyses. Then  $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
+### Example 2 - Correlation of different analyses within the same population
+Let's consider another simple situation, we want to compare single population, for example, the population 1, but in different analyses, interim and final analyses. Then  $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
 The correlation matrix will be
 $$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$
 The number of events are listed as
 ```{r}
 event_tb2 <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100, 200
+  "Population 1", 100,200
 )
 event_tb2 %>%
   gt() %>%
@@ -86,11 +90,12 @@ event_tb2 %>%
 ```
 The the corrleation could be simply calculated as 
 $$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
+The 100 in the numerator is the overlap number of events of interim analysis and final analysis in population 1.
 ```{r}
-Corr1 <- 100 / sqrt(100 * 200)
-round(Corr1, 2)
+Corr1=100/sqrt(100*200)
+round(Corr1,2)
 ```
-### Example 3 - Cross Population Cross Analyses
+### Example 3 - Correlation of different analyses and different population
 Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
 The correlation matrix will be
 $$Corr(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$
@@ -98,165 +103,57 @@ The number of events are listed as
 ```{r}
 event_tb3 <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100, 200,
+  "Population 1", 100,200,
   "Population 2", 110, 220,
-  "Overlap in population 1 and 2", 80, 160
+  "Overlap in population 1 and 2", 80,160
+
 )
 event_tb3 %>%
   gt() %>%
   tab_header(title = "Number of events at each population & analyses in example 3")
 ```
-The the corrleation could be simply calculated as 
+The correlation could be simply calculated as 
 $$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
+The 80 in the numerator is the overlap number of events of population 1 in interim analysis and population 2 in final analysis.
 ```{r}
-Corr1 <- 80 / sqrt(100 * 220)
-round(Corr1, 2)
+Corr1=80/sqrt(100*220)
+round(Corr1,2)
 ```
-Now we know how to calculate the correlation values under different situations, and the generate_corr function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
-
-First, we need a event table including the information of the cohort.
+Now we know how to calculate the correlation values under different situations, and the generate_corr() function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
 
+First, we need a event table including the information of the study.
 
 ```{r}
 library(wpgsd)
-# The event table
+#The event table
 event <- tibble::tribble(
-  ~H1, ~H2, ~Analysis, ~Event,
-  1, 1, 1, 100,
-  2, 2, 1, 110,
-  3, 3, 1, 225,
-  1, 2, 1, 80,
-  1, 3, 1, 100,
-  2, 3, 1, 110,
-  1, 1, 2, 200,
-  2, 2, 2, 220,
-  3, 3, 2, 450,
-  1, 2, 2, 160,
-  1, 3, 2, 200,
-  2, 3, 2, 220
-)
+   ~ H1, ~H2, ~Analysis, ~Event,
+   1, 1, 1, 100,
+   2, 2, 1, 110,
+   3, 3, 1, 225,
+   1, 2, 1, 80,
+   1, 3, 1, 100,
+   2, 3, 1, 110,
+   1, 1, 2, 200,
+   2, 2, 2, 220,
+   3, 3, 2, 450,
+   1, 2, 2, 160,
+   1, 3, 2, 200,
+   2, 3, 2, 220
+ )
 event %>%
   gt() %>%
   tab_header(title = "Number of events at each population & analyses")
 ```
-"H1" means the experimental treatment is superior to the control in the population 1/experimental arm 1; "H2" means the experimental treatment is superior to the control in the population 2/experimental arm 2; "Analysis" means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis; and the "Event" means the number of events in this condition. 
-
-For example: H1=1, H2=1, Analysis=1, Event=100 means in the first population, there are 100 cases of experimental treatment is superior to the control in the interim analysis.
-
-Another example: H1=1, H2=2, Analysis=2, Event=160 means the overlap number of experimental treatment superior to the control in population 1 and 2 in the final analysis is 160.
-
-*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.                                                                                                                                                                                                                                                                                                                                                                                                                                                          
-After we have the event table, we can use generate_corr function to calculate correlation.
-
-```{r}
-all_corr <- round(generate_corr(event), 2)
-colnames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-rownames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-all_corr
-```
-* P1/P2: Population 1/2; IA: Interim analysis; FA: Final analysis
-
-### Some situations could be considered:
-Situation 1: The number of events in one of the population is extremely small.
-
-For example, the number of events in population 1 is very small. 
-
-The code will still give you the results
-
-```{r}
-event <- tibble::tribble(
-  ~H1, ~H2, ~Analysis, ~Event,
-  1, 1, 1, 5,
-  2, 2, 1, 1100,
-  3, 3, 1, 2250,
-  1, 2, 1, 4,
-  1, 3, 1, 2,
-  2, 3, 1, 1100,
-  1, 1, 2, 8,
-  2, 2, 2, 2200,
-  3, 3, 2, 4500,
-  1, 2, 2, 6,
-  1, 3, 2, 7,
-  2, 3, 2, 2200
-)
-all_corr <- round(generate_corr(event), 2)
-colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-all_corr
-```
+* "H1" refers to one hypothesis, selected depending on the interest, while "H2" refers to the other hypothesis, both of which are listed for multiplicity testing. For example, "H1" means the experimental treatment is superior to the control in the population 1/experimental arm 1; "H2" means the experimental treatment is superior to the control in the population 2/experimental arm 2; 
 
-Situation 2: The overlap between population 1&2 is 0
+* "Analysis" means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis;
 
-The code will still give you results but with some correlations are 0
-
-```{r}
-event <- tibble::tribble(
-  ~H1, ~H2, ~Analysis, ~Event,
-  1, 1, 1, 100,
-  2, 2, 1, 110,
-  3, 3, 1, 225,
-  1, 2, 1, 0,
-  1, 3, 1, 100,
-  2, 3, 1, 110,
-  1, 1, 2, 200,
-  2, 2, 2, 220,
-  3, 3, 2, 450,
-  1, 2, 2, 0,
-  1, 3, 2, 200,
-  2, 3, 2, 220
-)
-all_corr <- round(generate_corr(event), 2)
-colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-all_corr
-```
+* "Event" is the common events overlap by H1 and H2.
 
-Situation 3-1: The number of events number mistakenly been recorded as negative
 
-The warning message will be displayed, and NA's have been generated.
-```{r}
-event <- tibble::tribble(
-  ~H1, ~H2, ~Analysis, ~Event,
-  1, 1, 1, -100,
-  2, 2, 1, 110,
-  3, 3, 1, 225,
-  1, 2, 1, 80,
-  1, 3, 1, 100,
-  2, 3, 1, 110,
-  1, 1, 2, -200,
-  2, 2, 2, 220,
-  3, 3, 2, 450,
-  1, 2, 2, 160,
-  1, 3, 2, 200,
-  2, 3, 2, 220
-)
-all_corr <- round(generate_corr(event), 2)
-colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-all_corr
-```
+For example: H1=1, H2=1, Analysis=1, Event=100 means in the first population, there are 100 cases of experimental treatment is superior to the control in the interim analysis.
 
-Situation 3-2: The number of overlap events number mistakenly been recorded as negative
+Another example: H1=1, H2=2, Analysis=2, Event=160 means the overlap number of experimental treatment superior to the control in population 1 and 2 in the final analysis is 160.
 
-No warning or error message generated. But the correlation could be negative, which is misleading information. Please be careful and check data before go to the next step.
-```{r}
-event <- tibble::tribble(
-  ~H1, ~H2, ~Analysis, ~Event,
-  1, 1, 1, 100,
-  2, 2, 1, 110,
-  3, 3, 1, 225,
-  1, 2, 1, -80,
-  1, 3, 1, 100,
-  2, 3, 1, 110,
-  1, 1, 2, 200,
-  2, 2, 2, 220,
-  3, 3, 2, 450,
-  1, 2, 2, -160,
-  1, 3, 2, 200,
-  2, 3, 2, 220
-)
-all_corr <- round(generate_corr(event), 2)
-colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA")
-all_corr
-```
+*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.

From a16939a86b853b8617433520f989578477e20cab Mon Sep 17 00:00:00 2001
From: guangguangzai <guangguangzai@users.noreply.github.com>
Date: Wed, 21 Aug 2024 18:16:33 +0000
Subject: [PATCH 5/8] Style code (GHA)

---
 vignettes/wpgsd_corr_example.Rmd | 57 ++++++++++++++++----------------
 1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
index c9262f0..1782fb0 100644
--- a/vignettes/wpgsd_corr_example.Rmd
+++ b/vignettes/wpgsd_corr_example.Rmd
@@ -41,8 +41,8 @@ library(gt)
 ```{r}
 event_tb <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100,200,
-  "Population 2",  110,220,
+  "Population 1", 100, 200,
+  "Population 2", 110, 220,
   "Overlap of Population 1 and 2", 80, 160,
   "Overall Population", 225, 450
 )
@@ -60,7 +60,7 @@ The number of events are listed as
 event_tbl <- tribble(
   ~Population, ~"Number of Event in IA",
   "Population 1", 100,
-  "Population 2",  110,
+  "Population 2", 110,
   "Overlap in population 1 and 2", 80
 )
 event_tbl %>%
@@ -70,8 +70,8 @@ event_tbl %>%
 The the corrleation could be simply calculated as 
 $$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$
 ```{r}
-Corr1=80/sqrt(100*110)
-round(Corr1,2)
+Corr1 <- 80 / sqrt(100 * 110)
+round(Corr1, 2)
 ```
 
 ### Example 2 - Correlation of different analyses within the same population
@@ -82,7 +82,7 @@ The number of events are listed as
 ```{r}
 event_tb2 <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100,200
+  "Population 1", 100, 200
 )
 event_tb2 %>%
   gt() %>%
@@ -92,8 +92,8 @@ The the corrleation could be simply calculated as
 $$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
 The 100 in the numerator is the overlap number of events of interim analysis and final analysis in population 1.
 ```{r}
-Corr1=100/sqrt(100*200)
-round(Corr1,2)
+Corr1 <- 100 / sqrt(100 * 200)
+round(Corr1, 2)
 ```
 ### Example 3 - Correlation of different analyses and different population
 Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
@@ -103,10 +103,9 @@ The number of events are listed as
 ```{r}
 event_tb3 <- tribble(
   ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
-  "Population 1", 100,200,
+  "Population 1", 100, 200,
   "Population 2", 110, 220,
-  "Overlap in population 1 and 2", 80,160
-
+  "Overlap in population 1 and 2", 80, 160
 )
 event_tb3 %>%
   gt() %>%
@@ -116,8 +115,8 @@ The correlation could be simply calculated as
 $$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
 The 80 in the numerator is the overlap number of events of population 1 in interim analysis and population 2 in final analysis.
 ```{r}
-Corr1=80/sqrt(100*220)
-round(Corr1,2)
+Corr1 <- 80 / sqrt(100 * 220)
+round(Corr1, 2)
 ```
 Now we know how to calculate the correlation values under different situations, and the generate_corr() function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
 
@@ -125,22 +124,22 @@ First, we need a event table including the information of the study.
 
 ```{r}
 library(wpgsd)
-#The event table
+# The event table
 event <- tibble::tribble(
-   ~ H1, ~H2, ~Analysis, ~Event,
-   1, 1, 1, 100,
-   2, 2, 1, 110,
-   3, 3, 1, 225,
-   1, 2, 1, 80,
-   1, 3, 1, 100,
-   2, 3, 1, 110,
-   1, 1, 2, 200,
-   2, 2, 2, 220,
-   3, 3, 2, 450,
-   1, 2, 2, 160,
-   1, 3, 2, 200,
-   2, 3, 2, 220
- )
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
 event %>%
   gt() %>%
   tab_header(title = "Number of events at each population & analyses")
@@ -155,4 +154,4 @@ For example: H1=1, H2=1, Analysis=1, Event=100 indicates that in the first popul
 
 Another example: H1=1, H2=2, Analysis=2, Event=160 indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.
 
-*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.
\ No newline at end of file
+*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.

From 530bd57734a6d1087797d5870bc0a4d847d70020 Mon Sep 17 00:00:00 2001
From: guangguangzai <guangguangzai@users.noreply.github.com>
Date: Mon, 26 Aug 2024 15:36:38 +0000
Subject: [PATCH 6/8] Style code (GHA)

---
 vignettes/wpgsd_corr_example.Rmd | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
index db155d2..a024307 100644
--- a/vignettes/wpgsd_corr_example.Rmd
+++ b/vignettes/wpgsd_corr_example.Rmd
@@ -129,22 +129,22 @@ First, we need a event table including the information of the study.
 
 ```{r}
 library(wpgsd)
-#The event table
+# The event table
 event <- tibble::tribble(
-   ~ H1, ~H2, ~Analysis, ~Event,
-   1, 1, 1, 100,
-   2, 2, 1, 110,
-   3, 3, 1, 225,
-   1, 2, 1, 80,
-   1, 3, 1, 100,
-   2, 3, 1, 110,
-   1, 1, 2, 200,
-   2, 2, 2, 220,
-   3, 3, 2, 450,
-   1, 2, 2, 160,
-   1, 3, 2, 200,
-   2, 3, 2, 220
- )
+  ~H1, ~H2, ~Analysis, ~Event,
+  1, 1, 1, 100,
+  2, 2, 1, 110,
+  3, 3, 1, 225,
+  1, 2, 1, 80,
+  1, 3, 1, 100,
+  2, 3, 1, 110,
+  1, 1, 2, 200,
+  2, 2, 2, 220,
+  3, 3, 2, 450,
+  1, 2, 2, 160,
+  1, 3, 2, 200,
+  2, 3, 2, 220
+)
 # The event table
 event <- tibble::tribble(
   ~H1, ~H2, ~Analysis, ~Event,

From 81394051c6866baf47cadfb21558dfc6f133f1e3 Mon Sep 17 00:00:00 2001
From: "Zhao, Yujie" <yujie.zhao@merck.com>
Date: Mon, 26 Aug 2024 13:23:34 -0400
Subject: [PATCH 7/8] some editoral edits

---
 vignettes/wpgsd_corr_example.Rmd | 84 ++++++++++++++++----------------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
index a024307..df67ae9 100644
--- a/vignettes/wpgsd_corr_example.Rmd
+++ b/vignettes/wpgsd_corr_example.Rmd
@@ -1,14 +1,24 @@
 ---
-title: "Correlation Matrix Calculation"
-author: "Chenguang Zhang"
-date: "2024-05-14"
-output: html_document
+title: "Correlated test statistics"
+author: "Chenguang Zhang, Yujie Zhao"
+output:
+  rmarkdown::html_document:
+    toc: true
+    toc_float: true
+    toc_depth: 2
+    number_sections: true
+    highlight: "textmate"
+    css: "custom.css"
+    code_fold: hide
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Correlated test statistics}
 bibliography: wpgsd.bib
 ---
 
 The weighted parametric group sequential design (WPGSD) (@anderson2022unified) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. 
 
-## Notation
+# Methodologies to calculate correlations
 
 Suppose that in a group sequential trial there are $m$ elementary null hypotheses $H_i$, $i \in I={1,...,m}$, and there are $K$ analyses. Let $k$ be the index for the interim analyses and final analyses, $k=1,2,...K$. For any nonempty set $J \subseteq I$, we denote the intersection hypothesis $H_J=\cap_{j \in J}H_j$. We note that $H_I$ is the global null hypothesis.
 
@@ -17,7 +27,7 @@ We assume the plan is for all hypotheses to be tested at each of the $k$ planned
 Let $Z_{ik}$ be the standardized normal test statistic for hypothesis $i \in I$, analysis $1 \le k \le K$. Let $n_{ik}$ be the number of events collected cumulatively through stage $k$ for hypothesis $i$. Then $n_{i \wedge i',k \wedge k'}$ is the number of events included in both $Z_{ik}$ and $i$, $i' \in I$, $1 \le k$, $k' \le K$. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between $Z_{ik}$ and $Z_{i'k'}$ is
 $$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}$$. 
 
-## Examples
+# Examples
 
 We borrow an example from a paper by Anderson et al. (@anderson2022unified), demonstrated in Section 2 - Motivating Examples, we use Example 1 as the basis here. The setting will be:
 
@@ -29,9 +39,9 @@ In a two-arm controlled clinical trial with one primary endpoint, there are thre
 
 The 3 primary elementary hypotheses are:
 
-* H1: the experimental treatment is superior to the control in the population 1
-* H2: the experimental treatment is superior to the control in the population 2
-* H3: the experimental treatment is superior to the control in the overall population
+* **H1**: the experimental treatment is superior to the control in the population 1
+* **H2**: the experimental treatment is superior to the control in the population 2
+* **H3**: the experimental treatment is superior to the control in the overall population
   
 Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as
 ```{r,message=FALSE}
@@ -53,7 +63,7 @@ event_tb %>%
   tab_header(title = "Number of events at each population")
 ```
 
-###  Correlation of different populations within the same analysis
+##  Correlation of different populations within the same analysis
 Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. 
 The correlation matrix will be
 $$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$
@@ -76,7 +86,7 @@ Corr1 <- 80 / sqrt(100 * 110)
 round(Corr1, 2)
 ```
 
-### Correlation of different analyses within the same population
+## Correlation of different analyses within the same population
 Let's consider another simple situation, we want to compare single population, for example, the population 1, but in different analyses, interim and final analyses. Then  $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
 The correlation matrix will be
 $$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$
@@ -91,17 +101,17 @@ event_tb2 %>%
   tab_header(title = "Number of events at each analyses in example 2")
 ```
 The the corrleation could be simply calculated as 
-$$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
+$$\text{Corr}(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
 The 100 in the numerator is the overlap number of events of interim analysis and final analysis in population 1.
 ```{r}
 Corr1 <- 100 / sqrt(100 * 200)
 round(Corr1, 2)
 ```
 
-### Correlation of different analyses and different population
+## Correlation of different analyses and different population
 Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
 The correlation matrix will be
-$$Corr(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$
+$$\text{Corr}(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$
 The number of events are listed as
 ```{r}
 event_tb3 <- tribble(
@@ -116,18 +126,28 @@ event_tb3 %>%
 ```
 
 The correlation could be simply calculated as 
-$$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
+$$\text{Corr}(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
 The 80 in the numerator is the overlap number of events of population 1 in interim analysis and population 2 in final analysis.
 ```{r}
 Corr1 <- 80 / sqrt(100 * 220)
 round(Corr1, 2)
 ```
 
-Now we know how to calculate the correlation values under different situations, and the $generate$_$corr()$ function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
+# Generate the correlation matrix by `generate_corr()`    
+Now we know how to calculate the correlation values under different situations, and the `generate_corr()` function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
 
 First, we need a event table including the information of the study.
 
-```{r}
+- `H1` refers to one hypothesis, selected depending on the interest, while `H2` refers to the other hypothesis, both of which are listed for multiplicity testing. For example, `H1` means the experimental treatment is superior to the control in the population 1/experimental arm 1; `H2` means the experimental treatment is superior to the control in the population 2/experimental arm 2; 
+- `Analysis` means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis;
+- `Event` is the common events overlap by `H1` and `H2`.
+
+For example: `H1=1`, `H2=1`, `Analysis=1`, `Event=100 `indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis.
+
+Another example: `H1=1`, `H2=2`, `Analysis=2`, `Event=160` indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.
+
+To be noticed, the column names in this function are fixed to be `H1`, `H2`, `Analysis`, `Event`.
+```{r, message=FALSE}
 library(wpgsd)
 # The event table
 event <- tibble::tribble(
@@ -145,36 +165,16 @@ event <- tibble::tribble(
   1, 3, 2, 200,
   2, 3, 2, 220
 )
-# The event table
-event <- tibble::tribble(
-  ~H1, ~H2, ~Analysis, ~Event,
-  1, 1, 1, 100,
-  2, 2, 1, 110,
-  3, 3, 1, 225,
-  1, 2, 1, 80,
-  1, 3, 1, 100,
-  2, 3, 1, 110,
-  1, 1, 2, 200,
-  2, 2, 2, 220,
-  3, 3, 2, 450,
-  1, 2, 2, 160,
-  1, 3, 2, 200,
-  2, 3, 2, 220
-)
 
 event %>%
   gt() %>%
   tab_header(title = "Number of events at each population & analyses")
 ```
 
-* "H1" refers to one hypothesis, selected depending on the interest, while "H2" refers to the other hypothesis, both of which are listed for multiplicity testing. For example, "H1" means the experimental treatment is superior to the control in the population 1/experimental arm 1; "H2" means the experimental treatment is superior to the control in the population 2/experimental arm 2; 
-* "Analysis" means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis;
-
-* "Event" is the common events overlap by H1 and H2.
-
-For example: H1=1, H2=1, Analysis=1, Event=100 indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis.
-
-Another example: H1=1, H2=2, Analysis=2, Event=160 indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.
+Then we input the above event table to the function of `generate_corr()`, and get the correlation matrix as follow.
+```{r}
+generate_corr(event)
+```
 
-*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.
+# References
 

From 9636269a4b362f49a778663bd6d5ddbceab562f9 Mon Sep 17 00:00:00 2001
From: "Zhao, Yujie" <yujie.zhao@merck.com>
Date: Mon, 26 Aug 2024 13:24:10 -0400
Subject: [PATCH 8/8] rename file

---
 vignettes/{wpgsd_corr_example.Rmd => corr_calculation.Rmd} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename vignettes/{wpgsd_corr_example.Rmd => corr_calculation.Rmd} (100%)

diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/corr_calculation.Rmd
similarity index 100%
rename from vignettes/wpgsd_corr_example.Rmd
rename to vignettes/corr_calculation.Rmd