-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cgwpgsd #44
Cgwpgsd #44
Changes from 3 commits
798f709
2f10d16
a24a96c
028e0a7
6235ec0
a16939a
b15eda0
513708b
530bd57
8139405
9636269
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,262 @@ | ||
--- | ||
title: "Correlation Matrix Calculation" | ||
author: "Chenguang Zhang" | ||
date: "2024-05-14" | ||
output: html_document | ||
--- | ||
|
||
The weighted parametric group sequential design (WPGSD) (Anderson et al. (2022)) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. | ||
|
||
## Notation | ||
|
||
Suppose that in a group sequential trial there are $m$ elementary null hypotheses $H_i$, $i \in I={1,...,m}$, and there are $K$ analyses. Let $k$ be the index for the interim analyses and final analyses, $k=1,2,...K$. For any nonempty set $J \subseteq I$, we denote the intersection hypothesis $H_J=\cap_{j \in J}H_j$. We note that $H_I$ is the global null hypothesis. | ||
|
||
We assume the plan is for all hypotheses to be tested at each of the $k$ planned analyses if the trial continues to the end for all hypotheses. We further assume that the distribution of the $m \times K$ tests of $m$ individual hypotheses at all $k$ analyses is multivariate normal with a completely known correlation matrix. | ||
|
||
Let $Z_{ik}$ be the standardized normal test statistic for hypothesis $i \in I$, analysis $1 \le k \le K$. Let $n_{ik}$ be the number of events collected cumulatively through stage $k$ for hypothesis $i$. Then $n_{i \wedge i',k \wedge k'}$ is the number of events included in both $Z_{ik}$ and $i$, $i' \in I$, $1 \le k$, $k' \le K$. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between $Z_{ik}$ and $Z_{i'k'}$ is | ||
$$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}$$. | ||
|
||
## Examples | ||
|
||
In a 2-arm controlled clinical trial example with one primary endpoint, there are 3 patient populations defined by the status of two biomarkers A and B: | ||
|
||
* Biomarker A positive, the population 1, | ||
* Biomarker B positive, the population 2, | ||
* Overall population. | ||
|
||
The 3 primary elementary hypotheses are: | ||
|
||
* H1: the experimental treatment is superior to the control in the population 1 | ||
* H2: the experimental treatment is superior to the control in the population 2 | ||
* H3: the experimental treatment is superior to the control in the overall population | ||
|
||
Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as | ||
```{r} | ||
library(dplyr) | ||
library(tibble) | ||
library(gt) | ||
event_tb <- tribble( | ||
~Population, ~"Number of Event in IA", ~"Number of Event in FA", | ||
"Population 1", 100, 200, | ||
"Population 2", 110, 220, | ||
"Overlap of Population 1 and 2", 80, 160, | ||
"Overall Population", 225, 450 | ||
) | ||
event_tb %>% | ||
gt() %>% | ||
tab_header(title = "Number of events at each population") | ||
``` | ||
|
||
### Example 1 - Same Analyses Different Population | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we call it "Correlation of different populations within the same analysis"? |
||
Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. | ||
The correlation matrix will be | ||
$$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$ | ||
The number of events are listed as | ||
```{r} | ||
event_tbl <- tribble( | ||
~Population, ~"Number of Event in IA", | ||
"Population 1", 100, | ||
"Population 2", 110, | ||
"Overlap in population 1 and 2", 80 | ||
) | ||
event_tbl %>% | ||
gt() %>% | ||
tab_header(title = "Number of events at each population in example 1") | ||
``` | ||
The the corrleation could be simply calculated as | ||
$$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$ | ||
```{r} | ||
Corr1 <- 80 / sqrt(100 * 110) | ||
round(Corr1, 2) | ||
``` | ||
|
||
### Example 2 - Same Population Different Analyses | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we call it "Correlation of different analyses within the same population"? |
||
Let's consider another simple situation, we want to compare single population, for example population 1, but in different analyses, interim and final analyses. Then $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. | ||
The correlation matrix will be | ||
$$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$ | ||
The number of events are listed as | ||
```{r} | ||
event_tb2 <- tribble( | ||
~Population, ~"Number of Event in IA", ~"Number of Event in FA", | ||
"Population 1", 100, 200 | ||
) | ||
event_tb2 %>% | ||
gt() %>% | ||
tab_header(title = "Number of events at each analyses in example 2") | ||
``` | ||
The the corrleation could be simply calculated as | ||
$$Corr(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please explain the 100 at the numerator. |
||
```{r} | ||
Corr1 <- 100 / sqrt(100 * 200) | ||
round(Corr1, 2) | ||
``` | ||
### Example 3 - Cross Population Cross Analyses | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we call it "Correlation of different analyses and different population"? |
||
Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. | ||
The correlation matrix will be | ||
$$Corr(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$ | ||
The number of events are listed as | ||
```{r} | ||
event_tb3 <- tribble( | ||
~Population, ~"Number of Event in IA", ~"Number of Event in FA", | ||
"Population 1", 100, 200, | ||
"Population 2", 110, 220, | ||
"Overlap in population 1 and 2", 80, 160 | ||
) | ||
event_tb3 %>% | ||
gt() %>% | ||
tab_header(title = "Number of events at each population & analyses in example 3") | ||
``` | ||
The the corrleation could be simply calculated as | ||
$$Corr(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please explain the 80 at the numerator. |
||
```{r} | ||
Corr1 <- 80 / sqrt(100 * 220) | ||
round(Corr1, 2) | ||
``` | ||
Now we know how to calculate the correlation values under different situations, and the generate_corr function was built based on this logic. We can directly calculate the results for each cross situation via the function. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. generate_corr -> |
||
|
||
First, we need a event table including the information of the cohort. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The word "cohort" is confusing... |
||
|
||
|
||
```{r} | ||
library(wpgsd) | ||
# The event table | ||
event <- tibble::tribble( | ||
~H1, ~H2, ~Analysis, ~Event, | ||
1, 1, 1, 100, | ||
2, 2, 1, 110, | ||
3, 3, 1, 225, | ||
1, 2, 1, 80, | ||
1, 3, 1, 100, | ||
2, 3, 1, 110, | ||
1, 1, 2, 200, | ||
2, 2, 2, 220, | ||
3, 3, 2, 450, | ||
1, 2, 2, 160, | ||
1, 3, 2, 200, | ||
2, 3, 2, 220 | ||
) | ||
event %>% | ||
gt() %>% | ||
tab_header(title = "Number of events at each population & analyses") | ||
``` | ||
"H1" indicates that the experimental treatment is superior to the control in population 1/experimental arm 1. "H2" indicates that the experimental treatment is superior to the control in population 2/experimental arm 2. "Analysis" refers to different stages of analysis, such as 1 for interim analysis and 2 for final analysis. "Event" represents the number of events in this condition. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This paragraph looks not correct to me... H1 is 1 hypothesis, H2 is the other hypothesis. Event is the common events overlap by H1 and H2. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. H1 could be the anyone from the hypotheses, listed in the multiplicity/to be tested, depending on the one interested. |
||
|
||
For example: H1=1, H2=1, Analysis=1, Event=100 indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Echo with my previous comment. We ought to say what is H1=1 means, and then H2 = 1 means first. Then explain what Event is under H1=1 and H2=1. |
||
|
||
Another example: H1=1, H2=2, Analysis=2, Event=160 indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Echo with my previous comment. |
||
|
||
*To be noticed, the column names in this function are fixed to be 'H1, H2, Analysis, Event'. | ||
After we have the event table, we can use generate_corr function to calculate correlation. | ||
|
||
```{r} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess we will no longer need the things after line 150, right? |
||
all_corr <- round(generate_corr(event), 2) | ||
colnames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
rownames(all_corr) <- c("P1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
all_corr | ||
``` | ||
* P1/P2: Population 1/2; IA: Interim analysis; FA: Final analysis | ||
|
||
### Some situations could be considered: | ||
Situation 1: The number of events in one of the population is extremely small. | ||
|
||
For example, the number of events in population 1 is very small. | ||
|
||
The code will still give you the results | ||
|
||
```{r} | ||
event <- tibble::tribble( | ||
~H1, ~H2, ~Analysis, ~Event, | ||
1, 1, 1, 5, | ||
2, 2, 1, 1100, | ||
3, 3, 1, 2250, | ||
1, 2, 1, 4, | ||
1, 3, 1, 2, | ||
2, 3, 1, 1100, | ||
1, 1, 2, 8, | ||
2, 2, 2, 2200, | ||
3, 3, 2, 4500, | ||
1, 2, 2, 6, | ||
1, 3, 2, 7, | ||
2, 3, 2, 2200 | ||
) | ||
all_corr <- round(generate_corr(event), 2) | ||
colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
all_corr | ||
``` | ||
|
||
Situation 2: The overlap between population 1&2 is 0 | ||
|
||
The code will still give you results but with some correlations are 0 | ||
|
||
```{r} | ||
event <- tibble::tribble( | ||
~H1, ~H2, ~Analysis, ~Event, | ||
1, 1, 1, 100, | ||
2, 2, 1, 110, | ||
3, 3, 1, 225, | ||
1, 2, 1, 0, | ||
1, 3, 1, 100, | ||
2, 3, 1, 110, | ||
1, 1, 2, 200, | ||
2, 2, 2, 220, | ||
3, 3, 2, 450, | ||
1, 2, 2, 0, | ||
1, 3, 2, 200, | ||
2, 3, 2, 220 | ||
) | ||
all_corr <- round(generate_corr(event), 2) | ||
colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
all_corr | ||
``` | ||
|
||
Situation 3-1: The number of events number mistakenly been recorded as negative | ||
|
||
The warning message will be displayed, and NA's have been generated. | ||
```{r} | ||
event <- tibble::tribble( | ||
~H1, ~H2, ~Analysis, ~Event, | ||
1, 1, 1, -100, | ||
2, 2, 1, 110, | ||
3, 3, 1, 225, | ||
1, 2, 1, 80, | ||
1, 3, 1, 100, | ||
2, 3, 1, 110, | ||
1, 1, 2, -200, | ||
2, 2, 2, 220, | ||
3, 3, 2, 450, | ||
1, 2, 2, 160, | ||
1, 3, 2, 200, | ||
2, 3, 2, 220 | ||
) | ||
all_corr <- round(generate_corr(event), 2) | ||
colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
all_corr | ||
``` | ||
|
||
Situation 3-2: The number of overlap events number mistakenly been recorded as negative | ||
|
||
No warning or error message generated. But the correlation could be negative, which is misleading information. Please be careful and check data before go to the next step. | ||
```{r} | ||
event <- tibble::tribble( | ||
~H1, ~H2, ~Analysis, ~Event, | ||
1, 1, 1, 100, | ||
2, 2, 1, 110, | ||
3, 3, 1, 225, | ||
1, 2, 1, -80, | ||
1, 3, 1, 100, | ||
2, 3, 1, 110, | ||
1, 1, 2, 200, | ||
2, 2, 2, 220, | ||
3, 3, 2, 450, | ||
1, 2, 2, -160, | ||
1, 3, 2, 200, | ||
2, 3, 2, 220 | ||
) | ||
all_corr <- round(generate_corr(event), 2) | ||
colnames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
rownames(all_corr) <- c("Population 1, IA", "P2, IA", "P3, IA", "P1, FA", "P2, FA", "P3, FA") | ||
all_corr | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cite where this example is from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cite paper, example 1