Skip to content

Commit

Permalink
deffK
Browse files Browse the repository at this point in the history
  • Loading branch information
astra-cdc committed Feb 29, 2024
1 parent 1d4bc20 commit d9fa92c
Show file tree
Hide file tree
Showing 7 changed files with 123 additions and 119 deletions.
34 changes: 13 additions & 21 deletions R/tab.R
Original file line number Diff line number Diff line change
Expand Up @@ -149,11 +149,21 @@ tab = function(...
}

##
sto = svytotal(frm, design) # , deff = TRUE)
sto = svytotal(frm, design) # , deff = "replace")
mmcr = data.frame(x = as.numeric(sto)
, s = sqrt(diag(attr(sto, "var"))) )
mmcr$samp.size = .calc_samp_size(design = design, vr = vr, counts = counts)
mmcr$counts = counts
mmcr$counts = counts

# deff = attr(sto, "deff") %>% diag
# I am having trouble interpreting this deff.
# In some situations, results are unusual.
# total(), tab("AGER"), tab("PAYNOCHG")
# Using Kish design effect instead.

mmcr$deff = by(design$prob, design$variables[,vr], deffK) %>% as.numeric
mmcr$samp.size = mmcr$counts / mmcr$deff
idx.bad = which(mmcr$samp.size > mmcr$counts)
mmcr$samp.size[idx.bad] = mmcr$counts[idx.bad]

df1 = degf(design)
mmcr$degf = df1
Expand Down Expand Up @@ -261,21 +271,3 @@ tab = function(...
}
df1
}


.calc_samp_size = function(design, vr, counts) {

# In svytotal(frm, design, deff = TRUE), DEff sometimes
# appears incorrect. If no variability, DEff = Inf.
# Calculating "Kish's Effective Sample Size" directly, bypassing DEff
# deff = attr(sto, "deff") %>% diag

design$wi = 1 / design$prob
design$wi[design$prob <= 0] = 0
design$wi2 = design$wi^2
sum_wi = by(design$wi, design$variables[,vr], sum) %>% as.numeric
sum_wi2 = by(design$wi2, design$variables[,vr], sum) %>% as.numeric
neff = sum_wi^2 / sum_wi2
assert_that(length(neff) == length(counts))
pmin(counts, neff)
}
8 changes: 6 additions & 2 deletions R/total.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,16 @@ total = function(csv = getOption("surveytable.csv") ) {
}

##
sto = svytotal(~Total, design) # , deff = TRUE)
sto = svytotal(~Total, design) # , deff = "replace")
mmcr = data.frame(x = as.numeric(sto)
, s = sqrt(diag(attr(sto, "var"))) )
mmcr$samp.size = .calc_samp_size(design = design, vr = "Total", counts = counts)
mmcr$counts = counts

mmcr$deff = deffK(design$prob)
mmcr$samp.size = mmcr$counts / mmcr$deff
idx.bad = which(mmcr$samp.size > mmcr$counts)
mmcr$samp.size[idx.bad] = mmcr$counts[idx.bad]

df1 = degf(design)
mmcr$degf = df1

Expand Down
22 changes: 22 additions & 0 deletions R/z_deffK.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# From:
# Richard Valliant and George Zipf (2023). PracTools: Designing and
# Weighting Survey Samples. R package version 1.4.2.
# https://CRAN.R-project.org/package=PracTools

# Kish design effect
# Kish design effect due to unequal weights
# w: vector of inverses of selection probabilities for a sample

# deffK = function (w)
deffK = function(prob)
{
w = 1 / prob
assert_that(all(w > 0), all(w < Inf))

# if (any(w <= 0))
# warning("Some weights are less than or equal to 0.\n")
n <- length(w)
ret = 1 + sum((w - mean(w))^2)/n/mean(w)^2
assert_that(all(ret > 0))
ret
}

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ pkgdown_sha: ~
articles:
Example-National-Ambulatory-Medical-Care-Survey-NAMCS-tables: Example-National-Ambulatory-Medical-Care-Survey-NAMCS-tables.html
surveytable: surveytable.html
last_built: 2024-02-23T16:05Z
last_built: 2024-02-29T02:56Z
urls:
reference: https://cdcgov.github.io/surveytable/reference
article: https://cdcgov.github.io/surveytable/articles
Expand Down
2 changes: 1 addition & 1 deletion docs/search.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ knitr::opts_chunk$set(
)
```

This example uses the National Ambulatory Medical Care Survey (NAMCS) 2019 Public Use File (PUF) to replicate certain tables from the [National Ambulatory Medical Care Survey: 2019 National Summary Tables](https://www.cdc.gov/nchs/data/ahcd/namcs_summary/2019-namcs-web-tables-508.pdf). NAMCS is "an annual nationally representative sample survey of visits to non-federal office-based patient care physicians, excluding anesthesiologists, radiologists, and pathologists." Note that the unit of observation is visits, not patients this distinction is important since a single patient can make multiple visits.
This example uses the National Ambulatory Medical Care Survey (NAMCS) 2019 Public Use File (PUF) to replicate certain tables from the [National Ambulatory Medical Care Survey: 2019 National Summary Tables](https://www.cdc.gov/nchs/data/ahcd/namcs_summary/2019-namcs-web-tables-508.pdf). NAMCS is "an annual nationally representative sample survey of visits to non-federal office-based patient care physicians, excluding anesthesiologists, radiologists, and pathologists." Note that the unit of observation is visits, not patients -- this distinction is important since a single patient can make multiple visits.

Selected variables from NAMCS 2019 come with the `surveytable` package, for use in examples, in an object called `namcs2019sv`. To make the below code more reusable, we'll call the survey by a generic name, namely, `mysurvey`.
Selected variables from NAMCS 2019 come with the `surveytable` package, for use in examples, in an object called `namcs2019sv`.

# Begin

Expand All @@ -29,8 +29,7 @@ library(surveytable)
Now, specify the survey that you'd like to analyze.

```{r, results='asis'}
mysurvey = namcs2019sv
set_survey(mysurvey)
set_survey(namcs2019sv)
```

Check the survey name, survey design variables, and the number of observations to verify that it all looks correct.
Expand Down Expand Up @@ -67,15 +66,9 @@ Once we have the overall population estimate, the overall rate is:
total_rate(uspop2019$total)
```

Recall that survey objects have an element called `variables`, which is a data frame that contains the survey variables. Let's examine the levels of `MSA`.
To calculate the rates for a particular variable, we need to provide a data frame with a variable called `Level` that matches the levels of the variable in the survey, and a variable called `Population` that gives the population size (which is assumed to be a constant rather than a random variable).

```{r}
levels(mysurvey$variables$MSA)
```

To calculate the rates for a particular variable, we need to provide a data frame with a variable called `Level` that matches the levels of the variable in the survey. `Population` gives the population estimate.

For example, for `MSA`, we need a data frame as follows:
For `MSA`, we can see the levels of the variables just by using the `tab()` command, just as we did above. Thus, to calculate rates, we need a data frame as follows:

```{r}
uspop2019$MSA
Expand Down

0 comments on commit d9fa92c

Please sign in to comment.