Skip to content

ehsanx/svyTable1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

svyTable1: Create Publication-Ready Survey-Weighted Summary Tables

svyTable1 is an R package for creating publication-ready tables from complex analytical results. It streamlines the creation of "Table 1" descriptive summaries from survey package objects and provides tools for formatting regression model outputs from multiply imputed datasets managed by the mice package.

Developed for common tasks in epidemiology and public health, the package integrates seamlessly with the widely-used survey package to correctly account for design features such as weights, strata, and clusters, while following best practices for transparency and readability.


✨ Key Features

  • Survey Design Integration: Works directly with svydesign objects from the R survey package.
  • Automatic Missing Data Handling: Detects and reports missing values for each variable, either as a β€œMissing” category for factors or a separate β€œMissing, n (%)” row for numeric variables.
  • Best Practice Reporting: Defaults to the β€œmixed mode” display, showing unweighted sample sizes (N) alongside weighted percentages (%) or means.
  • Built-in Reliability Checks: Automatically apply NCHS Data Presentation Standards for Proportions to flag or suppress unreliable estimates.
  • Flexible Output Modes: Easily switch between "mixed", "weighted", and "unweighted" summaries.
  • Survival Plotting: Provides svykmplot() to create publication-ready, survey-weighted Kaplan-Meier plots with attached number-at-risk tables.
  • Readability: Option to format large numbers with commas for improved readability.
  • Regression Diagnostics: Includes the svydiag() helper function to assess the reliability of model coefficients.
  • Goodness-of-Fit Testing: Provides svygof() to perform the Archer-Lemeshow goodness-of-fit test for survey logistic regression models.
  • Design-Correct AUC: Offers svyAUC() to calculate the Area Under the Curve (AUC) with proper variance estimation for complex survey designs.
  • Multiple Imputation Support: Includes the svypooled() helper function to create clean, "fallacy-safe" tables from mipo objects (pooled model results) from the mice package.
  • Interaction Analysis: Provides tools to calculate and present interaction effects:
    • Joint Effects: Calculate effects for each combination of two factors relative to a reference group from an interaction model (jointeffects).
    • Simple Effects: Calculate stratum-specific effects from a joint-variable model (inteffects).
    • Additive Interaction: Calculate RERI, AP, and Synergy Index with survey-adjusted confidence intervals from either joint or interaction models (addint, addintlist).

🧩 Installation

You can install the development version of svyTable1 from GitHub with:

# install.packages("devtools")
# In README.md
devtools::install_github("ehsanx/svyTable1", build_vignettes = TRUE, dependencies = TRUE)

🧠 Usage Examples

Below are examples using data from the National Health and Nutrition Examination Survey (NHANES).

1. Data Preparation

library(svyTable1)
library(survey)
library(dplyr)
library(NHANES)

# Load the raw NHANES data (2009-2012)
data(NHANESraw)

# Prepare data for adults, keeping NAs to demonstrate missing data handling
nhanes_adults_with_na <- NHANESraw %>%
  filter(Age >= 20) %>%
  mutate(
    ObeseStatus = factor(ifelse(BMI >= 30, "Obese", "Not Obese"),
                         levels = c("Not Obese", "Obese"))
  )

# Create the survey design object
adult_design_with_na <- svydesign(
  id = ~SDMVPSU,
  strata = ~SDMVSTRA,
  weights = ~WTMEC2YR,
  nest = TRUE,
  data = nhanes_adults_with_na
)

2. Generate Tables

Example A: Handling Variables with Missing Data (Primary Use Case)

# Define variables, some with expected missing values
vars_with_missing <- c("Age", "Race1", "Education", "TotChol", "SmokeNow")

# Generate the summary table
table_with_missing <- svytable1(
  design = adult_design_with_na, 
  strata_var = "ObeseStatus", 
  table_vars = vars_with_missing
)

# Display table
knitr::kable(table_with_missing, caption = "Table 1: Participant Characteristics (with Missing Data)")

Example B: Summarizing Complete Data (No Missing Values)

library(tidyr)

# Define variables for a complete-case analysis
vars_for_complete_table <- c("Age", "Race1", "BPSysAve", "Pulse", "BMI")

# Create complete-case data
nhanes_adults_complete <- nhanes_adults_with_na %>%
  drop_na(all_of(vars_for_complete_table))

# Create a new design object
adult_design_complete <- svydesign(
  id = ~SDMVPSU, strata = ~SDMVSTRA, weights = ~WTMEC2YR,
  nest = TRUE, data = nhanes_adults_complete
)

# Generate the table
table_without_missing <- svytable1(
  design = adult_design_complete, 
  strata_var = "ObeseStatus", 
  table_vars = c("Age", "Race1", "BPSysAve", "Pulse")
)

# Display table
knitr::kable(table_without_missing, caption = "Table 2: Participant Characteristics (No Missing Data)")

Example C: Checking Estimate Reliability (Advanced)

# Generate the table with reliability checks enabled
results_list <- svytable1(
  design = adult_design_with_na, 
  strata_var = "ObeseStatus", 
  table_vars = vars_with_missing,
  reliability_checks = TRUE,
  return_metrics = TRUE
)

# View the formatted table
knitr::kable(results_list$formatted_table)

# View detailed reliability metrics
knitr::kable(results_list$reliability_metrics)

πŸ“Š Example Output 1

Example svytable1 output table from Example C with the reliability checks applied.

Variable Level Overall Missing Not Obese Obese
n 11,778 547 7,073 4,158
Age Mean (SD) 47.18 (16.89) 56.29 (19.15) 46.45 (17.32) 48.25 (15.87)
Race1 Black 2,577 (11.4%) 108 (12.1%) 1,296 (9.1%) 1,173 (15.8%)
Hispanic 1,210 (5.8%) * 714 (5.7%) 434 (6.0%)
Mexican 1,680 (8.2%) * 920 (7.3%) 685 (9.8%)
White 5,017 (67.2%) 235 (69.6%) 3,114 (68.6%) 1,668 (64.4%)
Other 1,294 (7.4%) 67 (6.0%) 1,029 (9.3%) 198 (4.0%)
Education 8th Grade 1,321 (6.1%) * 770 (5.9%) 472 (6.3%)
9 - 11th Grade 1,787 (11.8%) 84 (16.8%) 1,021 (11.3%) 682 (12.5%)
High School 2,595 (21.5%) 121 (21.1%) 1,496 (20.2%) 978 (23.8%)
Some College 3,399 (31.3%) 144 (33.4%) 1,968 (29.4%) 1,287 (34.5%)
College Grad 2,656 (29.3%) * 1,805 (33.0%) 735 (22.8%)
Missing 20 (0.1%) * * *
TotChol Mean (SD) 5.07 (1.07) 5.00 (1.42) 5.07 (1.08) 5.06 (1.04)
Missing, n (%) 1,169 (5.6%) 426 (15.5%) 480 (5.5%) 263 (5.4%)
SmokeNow No 2,779 (24.2%) 142 (29.1%) 1,580 (23.2%) 1,057 (26.0%)
Yes 2,454 (20.1%) 102 (19.5%) 1,594 (21.4%) 758 (17.6%)
Missing 6,545 (55.7%) 303 (51.4%) 3,899 (55.4%) 2,343 (56.4%)

Example D: Reliability Checks for Regression Models

Beyond descriptive tables, the package provides svydiag() to assess the reliability of coefficients from a survey-weighted regression model. It calculates key metrics like p-values, standard errors, and confidence interval widths.

# 1. Fit a logistic regression model using the complete-case design
fit_obesity <- svyglm(
  ObeseStatus ~ Age + Gender + Race1,
  design = adult_design_complete,
  family = quasibinomial()
)

# 2. Get the reliability diagnostics table for the model
diagnostics_table <- svydiag(fit_obesity)

# 3. Display the diagnostics table
knitr::kable(
  diagnostics_table,
  caption = "Table 3: Reliability Diagnostics for Obesity Model Coefficients",
  digits = 3
)

πŸ“Š Example Output 2

Example output table for Example D, which demonstrates the svydiag() function.

Term Estimate SE p.value is_significant CI_Lower CI_Upper CI_Width RSE_percent is_rse_high
(Intercept) -0.381 0.109 0.002 TRUE -0.604 -0.158 0.445 28.486 FALSE
Age 0.008 0.002 0.000 TRUE 0.005 0.012 0.007 20.782 FALSE
Gendermale -0.061 0.057 0.294 FALSE -0.179 0.056 0.236 93.470 TRUE
Race1Hispanic -0.493 0.103 0.000 TRUE -0.704 -0.282 0.422 20.870 FALSE
Race1Mexican -0.225 0.087 0.016 TRUE -0.403 -0.046 0.357 38.733 TRUE
Race1White -0.654 0.081 0.000 TRUE -0.821 -0.488 0.334 12.421 FALSE
Race1Other -1.351 0.131 0.000 TRUE -1.620 -1.082 0.538 9.707 FALSE

Example E: Goodness-of-Fit Test for Survey Models

# We will use the same survey design and model from the previous example.
# design = adult_design_complete
# fit = fit_obesity

# 1. Run the goodness-of-fit test
gof_results <- svygof(fit_obesity, adult_design_complete)

# 2. Display the results
knitr::kable(
  gof_results,
  caption = "Table 4: Archer-Lemeshow Goodness-of-Fit Test for Obesity Model"
)

Example F: Design-Correct AUC for Survey Models

# The svyAUC function requires a replicate-weights design object.
# 1. Create a replicate design object
rep_design <- as.svrepdesign(adult_design_complete)

# 2. Fit the model using the new replicate design
fit_obesity_rep <- svyglm(
  ObeseStatus ~ Age + Gender + Race1,
  design = rep_design,
  family = quasibinomial()
)

# 3. Calculate the design-correct AUC
auc_results <- svyAUC(fit_obesity_rep, rep_design)

# 4. Display the results
knitr::kable(
  auc_results,
  caption = "Table 5: Design-Correct AUC for Obesity Model"
)

Example G: Creating Tables from Multiply Imputed Data

The svypooled() function is designed to work with pooled model results from the mice package. It can produce a "fallacy-safe" table showing only the main exposure or a full table with all model variables.

First, we'll prepare some imputed data and run a pooled regression model.

Data Preparation & Modeling

# Load required packages
library(mice)

# Use the built-in `nhanes` dataset and create some missing values
nhanes_miss <- nhanes
nhanes_miss$hyp[sample(1:nrow(nhanes_miss), 5)] <- NA

# Perform multiple imputation
imputed_data <- mice(nhanes_miss, m = 5, seed = 123, printFlag = FALSE)

# Fit a logistic regression model on each imputed dataset and pool the results
fit <- with(imputed_data, glm(hyp ~ age + bmi + chl, family = "binomial"))
pooled_model <- pool(fit)

Generate a "Fallacy-Safe" Table (Default)

This table only shows the main exposure (age) and lists the adjustment variables in a footnote to prevent misinterpretation of covariate statistics.

# Generate the fallacy-safe table
svypooled(
  pooled_model = pooled_model,
  main_exposure = "age",
  adj_var_names = c("bmi", "chl"),
  measure = "OR",
  title = "Adjusted Odds of Hypertension by Age Group (Fallacy-Safe)"
)

Generate a Full Model Table

This table shows the formatted results for all variables included in the model.

# Generate the full table
svypooled(
  pooled_model = pooled_model,
  main_exposure = "age",
  adj_var_names = c("bmi", "chl"),
  measure = "OR",
  title = "Adjusted Odds of Hypertension (Full Model)",
  fallacy_safe = FALSE
)

Example H: Analyzing Interaction Effects

Demonstrating the calculation of additive interaction measures (RERI, AP, S) for the interaction between Race and Obesity status on Hypertension risk (defined as Systolic BP >= 130 mmHg).

library(survey)
library(NHANES)
library(dplyr)
library(tidyr)
library(msm)

data(NHANESraw)

vars_needed <- c("Age", "Race1", "BPSysAve", "BMI", "ObeseStatus", "Hypertension_130",
                 "SDMVPSU", "SDMVSTRA", "WTMEC2YR")

nhanes_adults_processed <- NHANESraw %>%
  filter(Age >= 20) %>%
  mutate(
    ObeseStatus = factor(ifelse(BMI >= 30, "Obese", "Not Obese"),
                         levels = c("Not Obese", "Obese")),
    Hypertension_130 = factor(ifelse(BPSysAve >= 130, "Yes", "No"),
                              levels = c("No", "Yes")),
    Race1 = relevel(as.factor(Race1), ref = "White")
  ) %>%
  select(all_of(vars_needed)) %>%
  drop_na()

adult_design_binary <- svydesign(
  id = ~SDMVPSU, strata = ~SDMVSTRA, weights = ~WTMEC2YR,
  nest = TRUE, data = nhanes_adults_processed
)

# --- Fit Interaction Term Model ---
interaction_model_logit <- svyglm(
  Hypertension_130 ~ Race1 * ObeseStatus + Age,
  design = adult_design_binary, family = quasibinomial()
)

# --- Fit Joint Variable Model ---
adult_design_binary <- update(adult_design_binary,
                              Race1_ObeseStatus = interaction(Race1, ObeseStatus, sep = "_", drop = TRUE)
)
adult_design_binary <- update(adult_design_binary,
                              Race1_ObeseStatus = relevel(Race1_ObeseStatus, ref = "White_Not Obese")
)
joint_model_logit <- svyglm(
  Hypertension_130 ~ Race1_ObeseStatus + Age,
  design = adult_design_binary, family = quasibinomial()
)

# --- 2. Calculate Additive Interactions using addintlist ---
all_interactions_table <- addintlist(
  model = interaction_model_logit,
  factor1_name = "Race1",
  factor2_name = "ObeseStatus",
  measures = "all", # Calculate RERI, AP, and S
  digits = 3 # Round to 3 decimal places
)

print("--- Additive Interaction Results (RERI, AP, S) ---")
print(all_interactions_table, n = 50)

# --- 3. Calculate Joint Effects using get_joint_effects ---
# Calculates OR for each combo (e.g., Black+Obese) vs. reference (White+Not Obese)
# using the interaction model output
joint_effects_ratio <- get_joint_effects(
  interaction_model = interaction_model_logit,
  factor1_name = "Race1",
  factor2_name = "ObeseStatus",
  scale = "ratio", # Get ORs
  digits = 2
)
print("--- Joint Effects (OR Scale) from Interaction Model ---")
print(joint_effects_ratio, n=20)

# --- 4. Calculate Simple Effects using inteffects ---
# Calculates stratum-specific ORs (e.g., OR for Obese vs Not Obese within Black race)
# using the joint variable model output
f1_levels <- levels(adult_design_binary$variables$Race1)
f2_levels <- levels(adult_design_binary$variables$ObeseStatus)

simple_effects_ratio <- inteffects(
  joint_model = joint_model_logit,
  joint_var_name = "Race1_ObeseStatus",
  factor1_name = "Race1",
  factor2_name = "ObeseStatus",
  factor1_levels = f1_levels,
  factor2_levels = f2_levels,
  level_separator = "_",
  scale = "ratio", # Get ORs
  digits = 2
)
print("--- Simple Effects (OR Scale) from Joint Model ---")
print(simple_effects_ratio, n=20)

Example I: Survey-Weighted Survival Plots

The package also includes svykmplot() to generate publication-ready, survey-weighted Kaplan-Meier plots with attached number-at-risk tables. This example uses the nhanes_mortality dataset included with the package.

# Load survival package for Surv() and ggplot2 for labs()
library(survival) 
library(ggplot2)

# Load the package's mortality dataset
data(nhanes_mortality)

# 1. Create the main survey design object
analytic_design <- svydesign(
  strata = ~strata,
  id = ~psu,
  weights = ~survey_weight,
  data = nhanes_mortality,
  nest = TRUE
)

# 2. Create a subsetted design for females
design_female <- subset(analytic_design, sex == "Female")

# 3. Define the formula
km_formula <- Surv(stime, status) ~ caff

# 4. Define a 4-color palette
distinct_palette <- c("#377EB8", "#FF7F00", "#4DAF4A", "#E41A1C")

# 5. Run the function
km_results_female <- svykmplot(
  formula = km_formula,
  design = design_female,
  legend_title = "Caffeine Consumption",
  time_unit = "days", 
  time_breaks = seq(0, 240, by = 60),
  palette = distinct_palette,
  show_pval = TRUE
)

# 6. Display the plot
km_results_female$plot

🀝 Contributing

Contributions are welcome! If you find a bug or have suggestions for improvements, please email to the maintainer.


πŸ“œ License

This package is released under the MIT License.

About

Table 1 and diagnostics from complex survey designs

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published