chap1.qmd

---
title: "Applied Survival Analysis"
subtitle: "Chapter 1 - Introduction"
css: style_slides.css
author:
  name: Lu Mao
  affiliation: 
   - name: Department of Biostatistics & Medical Informatics
   - University of Wisconsin-Madison
  email: lmao@biostat.wisc.edu
format: 
   revealjs:
     auto-stretch: false
  # beamer: default
editor: visual
include-in-header:
  - text: |
      <style type="text/css">
      ul li ul li {
        font-size: 0.80em;
      }
      </style>
---

## Outline

1.  Time-to-event data and examples
2.  Censoring mechanisms and implications
3.  Summarizing the raw data

$$\newcommand{\indep}{\perp \!\!\! \perp}$$

# Data and Examples

## What are time-to-event data?

- Common outcome type in medical studies
  - **Starting point**: Randomization, study entry, birth, etc.
  - **Endpoint**: Death, hospitalization, disease onset, etc.
  - In engineering: Machine failure times (reliability analysis)
- **Right censoring**  
  - Event does not occur by study end or dropout  
  - Only know event time $>$ censoring time
- **Survival analysis**: Statistical methods for censored data

## Example: Univariate event (I)

- **German Breast Cancer (GBC) Study**
  - **Population**: 686 patients with node-positive breast cancer  
  - **Objective**: Assess if tamoxifen + chemo reduces mortality  
  - **Baseline info**: Age, tumor size, hormone levels, menopausal status, etc.  
  - **Follow-up**: Median 44 months  
    - 171 deaths $\to$ exact times known  
    - 515 censored $\to$ survival time $>$ censoring time  
    
## Example: Univariate event (II)

-   **German Breast Cancer (GBC) Study**

![](images/intro_gbc.png){fig-align="center" width="75%"}

## Example: Recurrent events (I)

- **Chronic Granulomatous Disease (CGD) Study**
  - **Population**: 128 patients in a randomized placebo-controlled trial  
  - **Objective**: Assess gamma interferon effect on recurrent infections  
  - **Follow-up**: Median 293 days  
    - Infections: Min = 0, Max = 7  
  - **Challenge**: Correlated events within individuals
  - **Data** in "long" format (multiple records per patient)

## Example: Recurrent events (II)

-   **Chronic Granulomatous Disease (CGD) Study**

![](images/intro_cgd.png){fig-align="center" width="75%"}


## Example: Multivariate/Clustered Events (I)

- **Diabetic Retinopathy Study**
  - **Population**: 197 high-risk diabetic patients in a randomized controlled trial
  - **Objective**: Determine if photocoagulation (a laser treatment) delays blindness onset
  - **Design**: One eye treated (by either xenon or argon), the other untreated (control)
  - **Challenge**: Correlation between eyes


## Example: Multivariate/Clustered Events (II)

-   **Diabetic Retinopathy Study**

![](images/intro_drs.png){fig-align="center" width="70%"}

## Example: Competing Risks (I)

- **Definition**: Multiple types of events where one prevents the occurrence of the others
    -   Natural example: different causes of death
-   **Competing risk vs censoring**:
    -   Both terminate follow-up
    -   Competing risk: part of the outcome; inference based on its presence
    -   Censoring: irrelevant to outcome; inference based on its absence
-   **Example**: death from prostate cancer as main outcome
    -   Death from other (metastasized) cancers $\to$ competing risk
    -   Death from traffic accidents $\to$ censoring


## Example: Competing Risks (II)

-   **Bone Marrow Transplant Study**

    -   **population**: 864 multiple-myeloma leukemia patients undergoing allogeneic haematopoietic cell transplantation (HCT)
    -   **Objective**: Evaluate risk factors for treatment-related mortality (TRM) and relapse of leukemia
    -   **Competing risks**: TRM defined as death in remission (i.e., before relapse); thus is precluded by relapse
    -   **Risk factors**: cohort indicator (years 1995–2000 or 2001–2005), type of donor (unrelated or identical sibling), history of a prior transplant, time from diagnosis to transplantation (\<24 months, or ≥ 24 months)


## Example: Competing Risks (III)

-   **Bone Marrow Transplant Study**
    -   Why only one record per patient?

![](images/intro_bmt.png){fig-align="center" width="8in" height="3.3in"}

## Example: More Complex Outcomes (Semi-competing risks)

-   **German Breast Cancer (GBC) Study**
    -   Nonfatal event + terminal event (death)

![](images/intro_gbc1.png){fig-align="center" width="70%"}

## Example: More Complex Outcomes (with Longitudinal data)

-   **Anti-Retroviral Drug Trial**
    -   Repeated measures of CD4 cell count + death

![](images/intro_ard.png){fig-align="center" width="75%"}

## Example: More Complex Outcomes (Multistate process)

-   **Breast Cancer Life History Study**
    -   Remission $\to$ relapse $\to$ metastasis $\to$ death (can skip states)

![](images/intro_bc.png){fig-align="center" width="70%"}

## Example: Composite Endpoints(I)

-   **Composite endpoint**: one with multiple components
    -   Recurrent/multivariate events
    -   (Semi-)Competing risks
    -   Longitudinal measurements
    -   Multistate processes
-   Analysis of complex outcomes
    -   **Marginal approach**: models components separately
    -   **Conditional approach**: models components jointly
    -   **Composite approach**: combines components
        -   Progression/relapse-free survival (time to the earlier of progression/relapse or death)


## Example: Composite Endpoints(II)

-   Advantages:
    -   Concentrates information $\to$ Statistical efficiency
    -   No need for multiple testing adjustment
    -   A single measure of overall effect size
-   Preferred for primary analysis of Phase-III clinical trials by
    -   US Food and Drug Administration (FDA)
    -   ICH (International Council for Harmonisation for pharmaceuticals)
-   Challenges:
    -   Statistical efficiency (e.g., beyond first event)
    -   Scientific relevance (e.g., relative importance of components)


# Censoring mechanisms and implications

## Censoring Mechanisms

-   **Two mechanisms**
    -   Study termination (administrative censoring)
    -   Loss to follow-up (LTFU, e.g., withdrawal, death from other causes)

::: callout-caution
## Caution about censoring

-   Event/censoring time $=$ time *from starting point* (e.g., randomization) to event/censoring (as opposed to time on the calendar)

-   LTFU may not be independent of outcome (e.g., sicker patients withdraw early)

-   Collect withdrawal reasons if possible

-   Censoring or competing risk? $\leftarrow$ Domain knowledge
:::

## Censoring Mechanisms: Illustration

-   Calendar time vs time synchronized by starting point

![](images/intro_censor.png){fig-align="center" width="90%"}

## Statistical Implications (I)

-   Censored observation
    -   Not completely missing!
    -   Partial information: event time $>$ censoring time
    - Ignoring partial information $\to$ Bias in inference
- Naive approaches
    - Treat censoring as event $\to$ Underestimates time to event
    - Exclude censored observations $\to$ Underestimates time to event (longer event times more likely censored)


## Statistical Implications (II)

-   **Notation**

    -   $T$: Outcome event time
    -   $C$: Censoring time
    -   Observed data: $X=\min(T, C)$, $\delta = I(T\leq C)$
        -   (𝑋, 𝛿) = (`time`, `status`) in previous data examples

-   **Estimation**

    -   **Independent censoring assumption**

    $$ C \indep T$$

    -   **Estimand**: $S(t)={\rm pr}(T > t)$, i.e., probability of subject “surviving” to time $t$, using a random sample of $(X_i, \delta_i)$ $(i=1,\ldots, n)$


## Statistical Implications (III)

-   Naive methods
    -   Event-imputation empirical survival function: $$\hat S_{\rm imp}(t)=n^{-1}\sum_{i=1}^n I(X_i > t) \to {\rm pr}(X > t)\leq S(t)$$

    -   Complete-case empirical survival function: $$\hat S_{\rm cc}(t)=\frac{\sum_{i=1}^n I(X_i > t, \delta_i = 1)}{\sum_{i=1}^n\delta_i}
          \to {\rm pr}(T > t\mid T\leq C)\leq S(t)$$

    -   Both naïve methods underestimate the true survival function

## Statistical Implications: Example

-   **German Breast Cancer (GBC) Study**

![](images/intro_gbc_naive.png){fig-align="center" width="70%"}

# Summarizing Raw Data

## Importance of Descriptive Analysis

-   Statistical models rely more or less on assumptions
-   Good practice to summarize data descriptively as first step
    -   Get to know the data
    -   Informs subsequent analysis
    -   Check balance of baseline characteristics between randomized arms
    -   “Table 1” in medical research papers
-   Two types of summary statistics
    -   Subject-level characteristics (baseline variables, number of events per subject)
    -   Event rates (over aggregate length of follow-up)


## How to Calculate Event Rate (I)

-   Length of follow-up is **event-specific**
    -   If an event is "non-recurrent", its occurrence means patient is no longer at risk for it

![](images/intro_erate_form.png){fig-align="center" width="75%"}

::: callout-note
Denominator is called person-year (or person-time) of follow-up.
:::

## How to Calculate Event Rate (II)

-   Semi-competing risks

![](images/intro_erate1.png){fig-align="center" width="50%"}

## How to Calculate Event Rate (III)

-   Recurrent events

![](images/intro_erate2.png){fig-align="center" width="70%"}

## Table One: Example

![](images/intro_tab1.png){fig-align="center" width="75%"}

# Conclusion

## Chapter Summary

-   Types of time-to-event outcomes
    -   Univariate, recurrent, multivariate/clustered, (semi-)competing risks, repeated measures, multistate processes, and everything in between…
-   Common feature: censoring
    -   Arises if study ends or patient drops out prior to event
    -   Must be handled with care to avoid false conclusion
-   Importance of descriptive analysis
    -   Event rate $\to$ attention to denominator

## HW1 (Due Feb 5)

-   Choose one
    -   Problem 1.1 (Recommended for PhD in Stats/BDS)
    -   Problem 1.2
-   Problem 1.8 (Attach you annotated code)
-   (Extra credit) Problem 1.3

## Guidelines for HW

-   Present a readable and coherent text to report your methods and results
    -   Include numerical/graphical results only if they contribute to your narrative
    -   All tables and figures should be properly titled/captioned, with informative labels/legends
    -   Use full names instead of abbreviations/acronyms
        -   E.g., “meno” $\to$ “Menopause (yes v no)”; “est” $\to$ “Estrogen (fmol/mg)”
    -   Specify the unit of variable, e.g., "Age (years)"
    -   See Table 1.11 and Fig. 1.2 for examples
-   Append the full code for diagnostic purposes