Skip to content

Commit a55625d

Browse files
committed
Add Kish-method to rescale_weights
1 parent 60c2fa5 commit a55625d

File tree

2 files changed

+111
-63
lines changed

2 files changed

+111
-63
lines changed

R/rescale_weights.R

Lines changed: 78 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -2,63 +2,73 @@
22
#' @name rescale_weights
33
#'
44
#' @description Most functions to fit multilevel and mixed effects models only
5-
#' allow to specify frequency weights, but not design (i.e. sampling or
6-
#' probability) weights, which should be used when analyzing complex samples
7-
#' and survey data. `rescale_weights()` implements an algorithm proposed
8-
#' by \cite{Asparouhov (2006)} and \cite{Carle (2009)} to rescale design
9-
#' weights in survey data to account for the grouping structure of multilevel
10-
#' models, which then can be used for multilevel modelling.
5+
#' allow to specify frequency weights, but not design (i.e. sampling or
6+
#' probability) weights, which should be used when analyzing complex samples
7+
#' and survey data. `rescale_weights()` implements two algorithms, one proposed
8+
#' by \cite{Asparouhov (2006)} and \cite{Carle (2009)} and one proposed by
9+
#' \cite{Kish 1965}, to rescale design weights in survey data to account for the
10+
#' grouping structure of multilevel models, which then can be used for
11+
#' multilevel modelling.
1112
#'
1213
#' @param data A data frame.
1314
#' @param by Variable names (as character vector, or as formula), indicating
14-
#' the grouping structure (strata) of the survey data (level-2-cluster
15-
#' variable). It is also possible to create weights for multiple group
16-
#' variables; in such cases, each created weighting variable will be suffixed
17-
#' by the name of the group variable.
15+
#' the grouping structure (strata) of the survey data (level-2-cluster
16+
#' variable). It is also possible to create weights for multiple group
17+
#' variables; in such cases, each created weighting variable will be suffixed
18+
#' by the name of the group variable.
1819
#' @param probability_weights Variable indicating the probability (design or
19-
#' sampling) weights of the survey data (level-1-weight).
20+
#' sampling) weights of the survey data (level-1-weight).
2021
#' @param nest Logical, if `TRUE` and `by` indicates at least two
21-
#' group variables, then groups are "nested", i.e. groups are now a
22-
#' combination from each group level of the variables in `by`.
22+
#' group variables, then groups are "nested", i.e. groups are now a
23+
#' combination from each group level of the variables in `by`.
24+
#' @param method `"carle"` or `"kish"`.
2325
#'
2426
#' @return `data`, including the new weighting variables: `pweights_a`
25-
#' and `pweights_b`, which represent the rescaled design weights to use
26-
#' in multilevel models (use these variables for the `weights` argument).
27+
#' and `pweights_b`, which represent the rescaled design weights to use
28+
#' in multilevel models (use these variables for the `weights` argument).
2729
#'
2830
#' @details
31+
#' - `method = "carle"`
2932
#'
30-
#' Rescaling is based on two methods: For `pweights_a`, the sample weights
31-
#' `probability_weights` are adjusted by a factor that represents the proportion
32-
#' of group size divided by the sum of sampling weights within each group. The
33-
#' adjustment factor for `pweights_b` is the sum of sample weights within each
34-
#' group divided by the sum of squared sample weights within each group (see
35-
#' Carle (2009), Appendix B). In other words, `pweights_a` "scales the weights
36-
#' so that the new weights sum to the cluster sample size" while `pweights_b`
37-
#' "scales the weights so that the new weights sum to the effective cluster
38-
#' size".
39-
#'
40-
#' Regarding the choice between scaling methods A and B, Carle suggests that
41-
#' "analysts who wish to discuss point estimates should report results based on
42-
#' weighting method A. For analysts more interested in residual between-group
43-
#' variance, method B may generally provide the least biased estimates". In
44-
#' general, it is recommended to fit a non-weighted model and weighted models
45-
#' with both scaling methods and when comparing the models, see whether the
46-
#' "inferential decisions converge", to gain confidence in the results.
47-
#'
48-
#' Though the bias of scaled weights decreases with increasing group size,
49-
#' method A is preferred when insufficient or low group size is a concern.
50-
#'
51-
#' The group ID and probably PSU may be used as random effects (e.g. nested
52-
#' design, or group and PSU as varying intercepts), depending on the survey
53-
#' design that should be mimicked.
33+
#' Rescaling is based on two methods: For `pweights_a`, the sample weights
34+
#' `probability_weights` are adjusted by a factor that represents the
35+
#' proportion of group size divided by the sum of sampling weights within each
36+
#' group. The adjustment factor for `pweights_b` is the sum of sample weights
37+
#' within each group divided by the sum of squared sample weights within each
38+
#' group (see Carle (2009), Appendix B). In other words, `pweights_a` "scales
39+
#' the weights so that the new weights sum to the cluster sample size" while
40+
#' `pweights_b` "scales the weights so that the new weights sum to the
41+
#' effective cluster size".
42+
#'
43+
#' Regarding the choice between scaling methods A and B, Carle suggests that
44+
#' "analysts who wish to discuss point estimates should report results based
45+
#' on weighting method A. For analysts more interested in residual
46+
#' between-group variance, method B may generally provide the least biased
47+
#' estimates". In general, it is recommended to fit a non-weighted model and
48+
#' weighted models with both scaling methods and when comparing the models,
49+
#' see whether the "inferential decisions converge", to gain confidence in the
50+
#' results.
51+
#'
52+
#' Though the bias of scaled weights decreases with increasing group size,
53+
#' method A is preferred when insufficient or low group size is a concern.
54+
#'
55+
#' The group ID and probably PSU may be used as random effects (e.g. nested
56+
#' design, or group and PSU as varying intercepts), depending on the survey
57+
#' design that should be mimicked.
58+
#'
59+
#' - `method = "kish"`
60+
#'
61+
#' to do...
5462
#'
5563
#' @references
64+
#' - Asparouhov T. (2006). General Multi-Level Modeling with Sampling
65+
#' Weights. Communications in Statistics - Theory and Methods 35: 439-460
66+
#'
5667
#' - Carle A.C. (2009). Fitting multilevel models in complex survey data
5768
#' with design weights: Recommendations. BMC Medical Research Methodology
5869
#' 9(49): 1-13
5970
#'
60-
#' - Asparouhov T. (2006). General Multi-Level Modeling with Sampling
61-
#' Weights. Communications in Statistics - Theory and Methods 35: 439-460
71+
#' - Kish ...
6272
#'
6373
#' @examples
6474
#' if (require("lme4")) {
@@ -87,7 +97,7 @@
8797
#' )
8898
#' }
8999
#' @export
90-
rescale_weights <- function(data, by, probability_weights, nest = FALSE) {
100+
rescale_weights <- function(data, by, probability_weights, nest = FALSE, method = "carle") {
91101
if (inherits(by, "formula")) {
92102
by <- all.vars(by)
93103
}
@@ -107,6 +117,32 @@ rescale_weights <- function(data, by, probability_weights, nest = FALSE) {
107117
# sort id
108118
data_tmp$.bamboozled <- seq_len(nrow(data_tmp))
109119

120+
switch(method,
121+
carle = .rescale_weights_carle(nest, probability_weights, data_tmp, data, by, weight_non_na),
122+
.rescale_weights_kish(probability_weights, data_tmp, data, weight_non_na)
123+
)
124+
}
125+
126+
127+
# rescale weights, method Carle ----------------------------
128+
129+
.rescale_weights_kish <- function(probability_weights, data_tmp, data, weight_non_na) {
130+
weights <- mean(data_tmp[[probability_weights]])
131+
# design effect according to Kish
132+
deff <- mean(weights^2) / (mean(weights)^2)
133+
# rescale weights, so their mean is 1
134+
z_weights <- ((weights + 1) - mean(weights) ) / stats::sd(weights)
135+
# divide weights by design effect
136+
data$pweight <- NA_real_
137+
data$pweight[weight_non_na] <- z_weights / deff
138+
# return result
139+
data
140+
}
141+
142+
143+
# rescale weights, method Carle ----------------------------
144+
145+
.rescale_weights_carle <- function(nest, probability_weights, data_tmp, data, by, weight_non_na) {
110146
if (nest && length(by) < 2) {
111147
insight::format_warning(
112148
sprintf(

man/rescale_weights.Rd

Lines changed: 33 additions & 21 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)