some editoral edits

Merck · Aug 26, 2024 · 8139405 · 8139405
1 parent 530bd57
commit 8139405
Showing 1 changed file with 42 additions and 42 deletions.
diff --git a/vignettes/wpgsd_corr_example.Rmd b/vignettes/wpgsd_corr_example.Rmd
@@ -1,14 +1,24 @@
 ---
-title: "Correlation Matrix Calculation"
-author: "Chenguang Zhang"
-date: "2024-05-14"
-output: html_document
+title: "Correlated test statistics"
+author: "Chenguang Zhang, Yujie Zhao"
+output:
+  rmarkdown::html_document:
+    toc: true
+    toc_float: true
+    toc_depth: 2
+    number_sections: true
+    highlight: "textmate"
+    css: "custom.css"
+    code_fold: hide
+vignette: >
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteIndexEntry{Correlated test statistics}
 bibliography: wpgsd.bib
 ---
 
 The weighted parametric group sequential design (WPGSD) (@anderson2022unified) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. 
 
-## Notation
+# Methodologies to calculate correlations
 
 Suppose that in a group sequential trial there are $m$ elementary null hypotheses $H_i$, $i \in I={1,...,m}$, and there are $K$ analyses. Let $k$ be the index for the interim analyses and final analyses, $k=1,2,...K$. For any nonempty set $J \subseteq I$, we denote the intersection hypothesis $H_J=\cap_{j \in J}H_j$. We note that $H_I$ is the global null hypothesis.
 
@@ -17,7 +27,7 @@ We assume the plan is for all hypotheses to be tested at each of the $k$ planned
 Let $Z_{ik}$ be the standardized normal test statistic for hypothesis $i \in I$, analysis $1 \le k \le K$. Let $n_{ik}$ be the number of events collected cumulatively through stage $k$ for hypothesis $i$. Then $n_{i \wedge i',k \wedge k'}$ is the number of events included in both $Z_{ik}$ and $i$, $i' \in I$, $1 \le k$, $k' \le K$. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between $Z_{ik}$ and $Z_{i'k'}$ is
 $$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}$$. 
 
-## Examples
+# Examples
 
 We borrow an example from a paper by Anderson et al. (@anderson2022unified), demonstrated in Section 2 - Motivating Examples, we use Example 1 as the basis here. The setting will be:
 
@@ -29,9 +39,9 @@ In a two-arm controlled clinical trial with one primary endpoint, there are thre
 
 The 3 primary elementary hypotheses are:
 
-* H1: the experimental treatment is superior to the control in the population 1
-* H2: the experimental treatment is superior to the control in the population 2
-* H3: the experimental treatment is superior to the control in the overall population
+* **H1**: the experimental treatment is superior to the control in the population 1
+* **H2**: the experimental treatment is superior to the control in the population 2
+* **H3**: the experimental treatment is superior to the control in the overall population
 
 Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as
 ```{r,message=FALSE}
@@ -53,7 +63,7 @@ event_tb %>%
   tab_header(title = "Number of events at each population")
 ```
 
-###  Correlation of different populations within the same analysis
+##  Correlation of different populations within the same analysis
 Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. 
 The correlation matrix will be
 $$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$
@@ -76,7 +86,7 @@ Corr1 <- 80 / sqrt(100 * 110)
 round(Corr1, 2)
 ```
 
-### Correlation of different analyses within the same population
+## Correlation of different analyses within the same population
 Let's consider another simple situation, we want to compare single population, for example, the population 1, but in different analyses, interim and final analyses. Then  $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
 The correlation matrix will be
 $$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$
@@ -91,17 +101,17 @@ event_tb2 %>%
   tab_header(title = "Number of events at each analyses in example 2")
 ```
 The the corrleation could be simply calculated as 
-$$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
+$$\text{Corr}(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$
 The 100 in the numerator is the overlap number of events of interim analysis and final analysis in population 1.
 ```{r}
 Corr1 <- 100 / sqrt(100 * 200)
 round(Corr1, 2)
 ```
 
-### Correlation of different analyses and different population
+## Correlation of different analyses and different population
 Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. 
 The correlation matrix will be
-$$Corr(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$
+$$\text{Corr}(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$
 The number of events are listed as
 ```{r}
 event_tb3 <- tribble(
@@ -116,18 +126,28 @@ event_tb3 %>%
 ```
 
 The correlation could be simply calculated as 
-$$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
+$$\text{Corr}(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$
 The 80 in the numerator is the overlap number of events of population 1 in interim analysis and population 2 in final analysis.
 ```{r}
 Corr1 <- 80 / sqrt(100 * 220)
 round(Corr1, 2)
 ```
 
-Now we know how to calculate the correlation values under different situations, and the $generate$_$corr()$ function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
+# Generate the correlation matrix by `generate_corr()`    
+Now we know how to calculate the correlation values under different situations, and the `generate_corr()` function was built based on this logic. We can directly calculate the results for each cross situation via the function. 
 
 First, we need a event table including the information of the study.
 
-```{r}
+- `H1` refers to one hypothesis, selected depending on the interest, while `H2` refers to the other hypothesis, both of which are listed for multiplicity testing. For example, `H1` means the experimental treatment is superior to the control in the population 1/experimental arm 1; `H2` means the experimental treatment is superior to the control in the population 2/experimental arm 2; 
+- `Analysis` means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis;
+- `Event` is the common events overlap by `H1` and `H2`.
+
+For example: `H1=1`, `H2=1`, `Analysis=1`, `Event=100 `indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis.
+
+Another example: `H1=1`, `H2=2`, `Analysis=2`, `Event=160` indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.
+
+To be noticed, the column names in this function are fixed to be `H1`, `H2`, `Analysis`, `Event`.
+```{r, message=FALSE}
 library(wpgsd)
 # The event table
 event <- tibble::tribble(
@@ -145,36 +165,16 @@ event <- tibble::tribble(
   1, 3, 2, 200,
   2, 3, 2, 220
 )
-# The event table
-event <- tibble::tribble(
-  ~H1, ~H2, ~Analysis, ~Event,
-  1, 1, 1, 100,
-  2, 2, 1, 110,
-  3, 3, 1, 225,
-  1, 2, 1, 80,
-  1, 3, 1, 100,
-  2, 3, 1, 110,
-  1, 1, 2, 200,
-  2, 2, 2, 220,
-  3, 3, 2, 450,
-  1, 2, 2, 160,
-  1, 3, 2, 200,
-  2, 3, 2, 220
-)
 
 event %>%
   gt() %>%
   tab_header(title = "Number of events at each population & analyses")
 ```
 
-* "H1" refers to one hypothesis, selected depending on the interest, while "H2" refers to the other hypothesis, both of which are listed for multiplicity testing. For example, "H1" means the experimental treatment is superior to the control in the population 1/experimental arm 1; "H2" means the experimental treatment is superior to the control in the population 2/experimental arm 2; 
-* "Analysis" means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis;
-
-* "Event" is the common events overlap by H1 and H2.
-
-For example: H1=1, H2=1, Analysis=1, Event=100 indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis.
-
-Another example: H1=1, H2=2, Analysis=2, Event=160 indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.
+Then we input the above event table to the function of `generate_corr()`, and get the correlation matrix as follow.
+```{r}
+generate_corr(event)
+```
 
-*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'.
+# References