svyTable1: Create Publication-Ready Survey-Weighted Summary Tables

svyTable1 is an R package for creating publication-ready tables from complex analytical results. It streamlines the creation of "Table 1" descriptive summaries from survey package objects and provides tools for formatting regression model outputs from multiply imputed datasets managed by the mice package.

Developed for common tasks in epidemiology and public health, the package integrates seamlessly with the widely-used survey package to correctly account for design features such as weights, strata, and clusters, while following best practices for transparency and readability.

✨ Key Features

Survey Design Integration: Works directly with svydesign objects from the R survey package.
Automatic Missing Data Handling: Detects and reports missing values for each variable, either as a “Missing” category for factors or a separate “Missing, n (%)” row for numeric variables.
Best Practice Reporting: Defaults to the “mixed mode” display, showing unweighted sample sizes (N) alongside weighted percentages (%) or means.
Built-in Reliability Checks: Automatically apply NCHS Data Presentation Standards for Proportions to flag or suppress unreliable estimates.
Flexible Output Modes: Easily switch between "mixed", "weighted", and "unweighted" summaries.
Survival Plotting: Provides svykmplot() to create publication-ready, survey-weighted Kaplan-Meier plots with attached number-at-risk tables.
Readability: Option to format large numbers with commas for improved readability.
Regression Diagnostics: Includes the svydiag() helper function to assess the reliability of model coefficients.
Goodness-of-Fit Testing: Provides svygof() to perform the Archer-Lemeshow goodness-of-fit test for survey logistic regression models.
Design-Correct AUC: Offers svyAUC() to calculate the Area Under the Curve (AUC) with proper variance estimation for complex survey designs.
Multiple Imputation Support: Includes the svypooled() helper function to create clean, "fallacy-safe" tables from mipo objects (pooled model results) from the mice package.
Interaction Analysis: Provides tools to calculate and present interaction effects:
- Joint Effects: Calculate effects for each combination of two factors relative to a reference group from an interaction model (jointeffects).
- Simple Effects: Calculate stratum-specific effects from a joint-variable model (inteffects).
- Additive Interaction: Calculate RERI, AP, and Synergy Index with survey-adjusted confidence intervals from either joint or interaction models (addint, addintlist).

🧩 Installation

You can install the development version of svyTable1 from GitHub with:

# install.packages("devtools")
# In README.md
devtools::install_github("ehsanx/svyTable1", build_vignettes = TRUE, dependencies = TRUE)

🧠 Usage Examples

Below are examples using data from the National Health and Nutrition Examination Survey (NHANES).

1. Data Preparation

library(svyTable1)
library(survey)
library(dplyr)
library(NHANES)

# Load the raw NHANES data (2009-2012)
data(NHANESraw)

# Prepare data for adults, keeping NAs to demonstrate missing data handling
nhanes_adults_with_na <- NHANESraw %>%
  filter(Age >= 20) %>%
  mutate(
    ObeseStatus = factor(ifelse(BMI >= 30, "Obese", "Not Obese"),
                         levels = c("Not Obese", "Obese"))
  )

# Create the survey design object
adult_design_with_na <- svydesign(
  id = ~SDMVPSU,
  strata = ~SDMVSTRA,
  weights = ~WTMEC2YR,
  nest = TRUE,
  data = nhanes_adults_with_na
)

2. Generate Tables

Example A: Handling Variables with Missing Data (Primary Use Case)

# Define variables, some with expected missing values
vars_with_missing <- c("Age", "Race1", "Education", "TotChol", "SmokeNow")

# Generate the summary table
table_with_missing <- svytable1(
  design = adult_design_with_na, 
  strata_var = "ObeseStatus", 
  table_vars = vars_with_missing
)

# Display table
knitr::kable(table_with_missing, caption = "Table 1: Participant Characteristics (with Missing Data)")

Example B: Summarizing Complete Data (No Missing Values)

library(tidyr)

# Define variables for a complete-case analysis
vars_for_complete_table <- c("Age", "Race1", "BPSysAve", "Pulse", "BMI")

# Create complete-case data
nhanes_adults_complete <- nhanes_adults_with_na %>%
  drop_na(all_of(vars_for_complete_table))

# Create a new design object
adult_design_complete <- svydesign(
  id = ~SDMVPSU, strata = ~SDMVSTRA, weights = ~WTMEC2YR,
  nest = TRUE, data = nhanes_adults_complete
)

# Generate the table
table_without_missing <- svytable1(
  design = adult_design_complete, 
  strata_var = "ObeseStatus", 
  table_vars = c("Age", "Race1", "BPSysAve", "Pulse")
)

# Display table
knitr::kable(table_without_missing, caption = "Table 2: Participant Characteristics (No Missing Data)")

Example C: Checking Estimate Reliability (Advanced)

# Generate the table with reliability checks enabled
results_list <- svytable1(
  design = adult_design_with_na, 
  strata_var = "ObeseStatus", 
  table_vars = vars_with_missing,
  reliability_checks = TRUE,
  return_metrics = TRUE
)

# View the formatted table
knitr::kable(results_list$formatted_table)

# View detailed reliability metrics
knitr::kable(results_list$reliability_metrics)

📊 Example Output 1

Example svytable1 output table from Example C with the reliability checks applied.

Variable	Level	Overall	Missing	Not Obese	Obese
n		11,778	547	7,073	4,158
Age	Mean (SD)	47.18 (16.89)	56.29 (19.15)	46.45 (17.32)	48.25 (15.87)
Race1	Black	2,577 (11.4%)	108 (12.1%)	1,296 (9.1%)	1,173 (15.8%)
	Hispanic	1,210 (5.8%)	*	714 (5.7%)	434 (6.0%)
	Mexican	1,680 (8.2%)	*	920 (7.3%)	685 (9.8%)
	White	5,017 (67.2%)	235 (69.6%)	3,114 (68.6%)	1,668 (64.4%)
	Other	1,294 (7.4%)	67 (6.0%)	1,029 (9.3%)	198 (4.0%)
Education	8th Grade	1,321 (6.1%)	*	770 (5.9%)	472 (6.3%)
	9 - 11th Grade	1,787 (11.8%)	84 (16.8%)	1,021 (11.3%)	682 (12.5%)
	High School	2,595 (21.5%)	121 (21.1%)	1,496 (20.2%)	978 (23.8%)
	Some College	3,399 (31.3%)	144 (33.4%)	1,968 (29.4%)	1,287 (34.5%)
	College Grad	2,656 (29.3%)	*	1,805 (33.0%)	735 (22.8%)
	Missing	20 (0.1%)	*	*	*
TotChol	Mean (SD)	5.07 (1.07)	5.00 (1.42)	5.07 (1.08)	5.06 (1.04)
	Missing, n (%)	1,169 (5.6%)	426 (15.5%)	480 (5.5%)	263 (5.4%)
SmokeNow	No	2,779 (24.2%)	142 (29.1%)	1,580 (23.2%)	1,057 (26.0%)
	Yes	2,454 (20.1%)	102 (19.5%)	1,594 (21.4%)	758 (17.6%)
	Missing	6,545 (55.7%)	303 (51.4%)	3,899 (55.4%)	2,343 (56.4%)

Example D: Reliability Checks for Regression Models

Beyond descriptive tables, the package provides svydiag() to assess the reliability of coefficients from a survey-weighted regression model. It calculates key metrics like p-values, standard errors, and confidence interval widths.

# 1. Fit a logistic regression model using the complete-case design
fit_obesity <- svyglm(
  ObeseStatus ~ Age + Gender + Race1,
  design = adult_design_complete,
  family = quasibinomial()
)

# 2. Get the reliability diagnostics table for the model
diagnostics_table <- svydiag(fit_obesity)

# 3. Display the diagnostics table
knitr::kable(
  diagnostics_table,
  caption = "Table 3: Reliability Diagnostics for Obesity Model Coefficients",
  digits = 3
)

📊 Example Output 2

Example output table for Example D, which demonstrates the svydiag() function.

Term	Estimate	SE	p.value	is_significant	CI_Lower	CI_Upper	CI_Width	RSE_percent	is_rse_high
(Intercept)	-0.381	0.109	0.002	TRUE	-0.604	-0.158	0.445	28.486	FALSE
Age	0.008	0.002	0.000	TRUE	0.005	0.012	0.007	20.782	FALSE
Gendermale	-0.061	0.057	0.294	FALSE	-0.179	0.056	0.236	93.470	TRUE
Race1Hispanic	-0.493	0.103	0.000	TRUE	-0.704	-0.282	0.422	20.870	FALSE
Race1Mexican	-0.225	0.087	0.016	TRUE	-0.403	-0.046	0.357	38.733	TRUE
Race1White	-0.654	0.081	0.000	TRUE	-0.821	-0.488	0.334	12.421	FALSE
Race1Other	-1.351	0.131	0.000	TRUE	-1.620	-1.082	0.538	9.707	FALSE

Example E: Goodness-of-Fit Test for Survey Models

# We will use the same survey design and model from the previous example.
# design = adult_design_complete
# fit = fit_obesity

# 1. Run the goodness-of-fit test
gof_results <- svygof(fit_obesity, adult_design_complete)

# 2. Display the results
knitr::kable(
  gof_results,
  caption = "Table 4: Archer-Lemeshow Goodness-of-Fit Test for Obesity Model"
)

Example F: Design-Correct AUC for Survey Models

# The svyAUC function requires a replicate-weights design object.
# 1. Create a replicate design object
rep_design <- as.svrepdesign(adult_design_complete)

# 2. Fit the model using the new replicate design
fit_obesity_rep <- svyglm(
  ObeseStatus ~ Age + Gender + Race1,
  design = rep_design,
  family = quasibinomial()
)

# 3. Calculate the design-correct AUC
auc_results <- svyAUC(fit_obesity_rep, rep_design)

# 4. Display the results
knitr::kable(
  auc_results,
  caption = "Table 5: Design-Correct AUC for Obesity Model"
)

Example G: Creating Tables from Multiply Imputed Data

The svypooled() function is designed to work with pooled model results from the mice package. It can produce a "fallacy-safe" table showing only the main exposure or a full table with all model variables.

First, we'll prepare some imputed data and run a pooled regression model.

Data Preparation & Modeling

# Load required packages
library(mice)

# Use the built-in `nhanes` dataset and create some missing values
nhanes_miss <- nhanes
nhanes_miss$hyp[sample(1:nrow(nhanes_miss), 5)] <- NA

# Perform multiple imputation
imputed_data <- mice(nhanes_miss, m = 5, seed = 123, printFlag = FALSE)

# Fit a logistic regression model on each imputed dataset and pool the results
fit <- with(imputed_data, glm(hyp ~ age + bmi + chl, family = "binomial"))
pooled_model <- pool(fit)

Generate a "Fallacy-Safe" Table (Default)

This table only shows the main exposure (age) and lists the adjustment variables in a footnote to prevent misinterpretation of covariate statistics.

# Generate the fallacy-safe table
svypooled(
  pooled_model = pooled_model,
  main_exposure = "age",
  adj_var_names = c("bmi", "chl"),
  measure = "OR",
  title = "Adjusted Odds of Hypertension by Age Group (Fallacy-Safe)"
)

Generate a Full Model Table

This table shows the formatted results for all variables included in the model.

# Generate the full table
svypooled(
  pooled_model = pooled_model,
  main_exposure = "age",
  adj_var_names = c("bmi", "chl"),
  measure = "OR",
  title = "Adjusted Odds of Hypertension (Full Model)",
  fallacy_safe = FALSE
)

Example H: Analyzing Interaction Effects

Demonstrating the calculation of additive interaction measures (RERI, AP, S) for the interaction between Race and Obesity status on Hypertension risk (defined as Systolic BP >= 130 mmHg).

library(survey)
library(NHANES)
library(dplyr)
library(tidyr)
library(msm)

data(NHANESraw)

vars_needed <- c("Age", "Race1", "BPSysAve", "BMI", "ObeseStatus", "Hypertension_130",
                 "SDMVPSU", "SDMVSTRA", "WTMEC2YR")

nhanes_adults_processed <- NHANESraw %>%
  filter(Age >= 20) %>%
  mutate(
    ObeseStatus = factor(ifelse(BMI >= 30, "Obese", "Not Obese"),
                         levels = c("Not Obese", "Obese")),
    Hypertension_130 = factor(ifelse(BPSysAve >= 130, "Yes", "No"),
                              levels = c("No", "Yes")),
    Race1 = relevel(as.factor(Race1), ref = "White")
  ) %>%
  select(all_of(vars_needed)) %>%
  drop_na()

adult_design_binary <- svydesign(
  id = ~SDMVPSU, strata = ~SDMVSTRA, weights = ~WTMEC2YR,
  nest = TRUE, data = nhanes_adults_processed
)

# --- Fit Interaction Term Model ---
interaction_model_logit <- svyglm(
  Hypertension_130 ~ Race1 * ObeseStatus + Age,
  design = adult_design_binary, family = quasibinomial()
)

# --- Fit Joint Variable Model ---
adult_design_binary <- update(adult_design_binary,
                              Race1_ObeseStatus = interaction(Race1, ObeseStatus, sep = "_", drop = TRUE)
)
adult_design_binary <- update(adult_design_binary,
                              Race1_ObeseStatus = relevel(Race1_ObeseStatus, ref = "White_Not Obese")
)
joint_model_logit <- svyglm(
  Hypertension_130 ~ Race1_ObeseStatus + Age,
  design = adult_design_binary, family = quasibinomial()
)

# --- 2. Calculate Additive Interactions using addintlist ---
all_interactions_table <- addintlist(
  model = interaction_model_logit,
  factor1_name = "Race1",
  factor2_name = "ObeseStatus",
  measures = "all", # Calculate RERI, AP, and S
  digits = 3 # Round to 3 decimal places
)

print("--- Additive Interaction Results (RERI, AP, S) ---")
print(all_interactions_table, n = 50)

# --- 3. Calculate Joint Effects using get_joint_effects ---
# Calculates OR for each combo (e.g., Black+Obese) vs. reference (White+Not Obese)
# using the interaction model output
joint_effects_ratio <- get_joint_effects(
  interaction_model = interaction_model_logit,
  factor1_name = "Race1",
  factor2_name = "ObeseStatus",
  scale = "ratio", # Get ORs
  digits = 2
)
print("--- Joint Effects (OR Scale) from Interaction Model ---")
print(joint_effects_ratio, n=20)

# --- 4. Calculate Simple Effects using inteffects ---
# Calculates stratum-specific ORs (e.g., OR for Obese vs Not Obese within Black race)
# using the joint variable model output
f1_levels <- levels(adult_design_binary$variables$Race1)
f2_levels <- levels(adult_design_binary$variables$ObeseStatus)

simple_effects_ratio <- inteffects(
  joint_model = joint_model_logit,
  joint_var_name = "Race1_ObeseStatus",
  factor1_name = "Race1",
  factor2_name = "ObeseStatus",
  factor1_levels = f1_levels,
  factor2_levels = f2_levels,
  level_separator = "_",
  scale = "ratio", # Get ORs
  digits = 2
)
print("--- Simple Effects (OR Scale) from Joint Model ---")
print(simple_effects_ratio, n=20)

Example I: Survey-Weighted Survival Plots

The package also includes svykmplot() to generate publication-ready, survey-weighted Kaplan-Meier plots with attached number-at-risk tables. This example uses the nhanes_mortality dataset included with the package.

# Load survival package for Surv() and ggplot2 for labs()
library(survival) 
library(ggplot2)

# Load the package's mortality dataset
data(nhanes_mortality)

# 1. Create the main survey design object
analytic_design <- svydesign(
  strata = ~strata,
  id = ~psu,
  weights = ~survey_weight,
  data = nhanes_mortality,
  nest = TRUE
)

# 2. Create a subsetted design for females
design_female <- subset(analytic_design, sex == "Female")

# 3. Define the formula
km_formula <- Surv(stime, status) ~ caff

# 4. Define a 4-color palette
distinct_palette <- c("#377EB8", "#FF7F00", "#4DAF4A", "#E41A1C")

# 5. Run the function
km_results_female <- svykmplot(
  formula = km_formula,
  design = design_female,
  legend_title = "Caffeine Consumption",
  time_unit = "days", 
  time_breaks = seq(0, 240, by = 60),
  palette = distinct_palette,
  show_pval = TRUE
)

# 6. Display the plot
km_results_female$plot

🤝 Contributing

Contributions are welcome! If you find a bug or have suggestions for improvements, please email to the maintainer.

📜 License

This package is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
R		R
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md
readme.html		readme.html
svyTable1.Rproj		svyTable1.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

svyTable1: Create Publication-Ready Survey-Weighted Summary Tables

✨ Key Features

🧩 Installation

🧠 Usage Examples

1. Data Preparation

2. Generate Tables

Example A: Handling Variables with Missing Data (Primary Use Case)

Example B: Summarizing Complete Data (No Missing Values)

Example C: Checking Estimate Reliability (Advanced)

📊 Example Output 1

Example D: Reliability Checks for Regression Models

📊 Example Output 2

Example E: Goodness-of-Fit Test for Survey Models

Example F: Design-Correct AUC for Survey Models

Example G: Creating Tables from Multiply Imputed Data

Data Preparation & Modeling

Generate a "Fallacy-Safe" Table (Default)

Generate a Full Model Table

Example H: Analyzing Interaction Effects

Example I: Survey-Weighted Survival Plots

🤝 Contributing

📜 License

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

ehsanx/svyTable1

Folders and files

Latest commit

History

Repository files navigation

svyTable1: Create Publication-Ready Survey-Weighted Summary Tables

✨ Key Features

🧩 Installation

🧠 Usage Examples

1. Data Preparation

2. Generate Tables

Example A: Handling Variables with Missing Data (Primary Use Case)

Example B: Summarizing Complete Data (No Missing Values)

Example C: Checking Estimate Reliability (Advanced)

📊 Example Output 1

Example D: Reliability Checks for Regression Models

📊 Example Output 2

Example E: Goodness-of-Fit Test for Survey Models

Example F: Design-Correct AUC for Survey Models

Example G: Creating Tables from Multiply Imputed Data

Data Preparation & Modeling

Generate a "Fallacy-Safe" Table (Default)

Generate a Full Model Table

Example H: Analyzing Interaction Effects

Example I: Survey-Weighted Survival Plots

🤝 Contributing

📜 License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages