2
2
# ' @name rescale_weights
3
3
# '
4
4
# ' @description Most functions to fit multilevel and mixed effects models only
5
- # ' allow to specify frequency weights, but not design (i.e. sampling or
6
- # ' probability) weights, which should be used when analyzing complex samples
7
- # ' and survey data. `rescale_weights()` implements an algorithm proposed
8
- # ' by \cite{Asparouhov (2006)} and \cite{Carle (2009)} to rescale design
9
- # ' weights in survey data to account for the grouping structure of multilevel
10
- # ' models, which then can be used for multilevel modelling.
5
+ # ' allow to specify frequency weights, but not design (i.e. sampling or
6
+ # ' probability) weights, which should be used when analyzing complex samples
7
+ # ' and survey data. `rescale_weights()` implements two algorithms, one proposed
8
+ # ' by \cite{Asparouhov (2006)} and \cite{Carle (2009)} and one proposed by
9
+ # ' \cite{Kish 1965}, to rescale design weights in survey data to account for the
10
+ # ' grouping structure of multilevel models, which then can be used for
11
+ # ' multilevel modelling.
11
12
# '
12
13
# ' @param data A data frame.
13
14
# ' @param by Variable names (as character vector, or as formula), indicating
14
- # ' the grouping structure (strata) of the survey data (level-2-cluster
15
- # ' variable). It is also possible to create weights for multiple group
16
- # ' variables; in such cases, each created weighting variable will be suffixed
17
- # ' by the name of the group variable.
15
+ # ' the grouping structure (strata) of the survey data (level-2-cluster
16
+ # ' variable). It is also possible to create weights for multiple group
17
+ # ' variables; in such cases, each created weighting variable will be suffixed
18
+ # ' by the name of the group variable.
18
19
# ' @param probability_weights Variable indicating the probability (design or
19
- # ' sampling) weights of the survey data (level-1-weight).
20
+ # ' sampling) weights of the survey data (level-1-weight).
20
21
# ' @param nest Logical, if `TRUE` and `by` indicates at least two
21
- # ' group variables, then groups are "nested", i.e. groups are now a
22
- # ' combination from each group level of the variables in `by`.
22
+ # ' group variables, then groups are "nested", i.e. groups are now a
23
+ # ' combination from each group level of the variables in `by`.
24
+ # ' @param method `"carle"` or `"kish"`.
23
25
# '
24
26
# ' @return `data`, including the new weighting variables: `pweights_a`
25
- # ' and `pweights_b`, which represent the rescaled design weights to use
26
- # ' in multilevel models (use these variables for the `weights` argument).
27
+ # ' and `pweights_b`, which represent the rescaled design weights to use
28
+ # ' in multilevel models (use these variables for the `weights` argument).
27
29
# '
28
30
# ' @details
31
+ # ' - `method = "carle"`
29
32
# '
30
- # ' Rescaling is based on two methods: For `pweights_a`, the sample weights
31
- # ' `probability_weights` are adjusted by a factor that represents the proportion
32
- # ' of group size divided by the sum of sampling weights within each group. The
33
- # ' adjustment factor for `pweights_b` is the sum of sample weights within each
34
- # ' group divided by the sum of squared sample weights within each group (see
35
- # ' Carle (2009), Appendix B). In other words, `pweights_a` "scales the weights
36
- # ' so that the new weights sum to the cluster sample size" while `pweights_b`
37
- # ' "scales the weights so that the new weights sum to the effective cluster
38
- # ' size".
39
- # '
40
- # ' Regarding the choice between scaling methods A and B, Carle suggests that
41
- # ' "analysts who wish to discuss point estimates should report results based on
42
- # ' weighting method A. For analysts more interested in residual between-group
43
- # ' variance, method B may generally provide the least biased estimates". In
44
- # ' general, it is recommended to fit a non-weighted model and weighted models
45
- # ' with both scaling methods and when comparing the models, see whether the
46
- # ' "inferential decisions converge", to gain confidence in the results.
47
- # '
48
- # ' Though the bias of scaled weights decreases with increasing group size,
49
- # ' method A is preferred when insufficient or low group size is a concern.
50
- # '
51
- # ' The group ID and probably PSU may be used as random effects (e.g. nested
52
- # ' design, or group and PSU as varying intercepts), depending on the survey
53
- # ' design that should be mimicked.
33
+ # ' Rescaling is based on two methods: For `pweights_a`, the sample weights
34
+ # ' `probability_weights` are adjusted by a factor that represents the
35
+ # ' proportion of group size divided by the sum of sampling weights within each
36
+ # ' group. The adjustment factor for `pweights_b` is the sum of sample weights
37
+ # ' within each group divided by the sum of squared sample weights within each
38
+ # ' group (see Carle (2009), Appendix B). In other words, `pweights_a` "scales
39
+ # ' the weights so that the new weights sum to the cluster sample size" while
40
+ # ' `pweights_b` "scales the weights so that the new weights sum to the
41
+ # ' effective cluster size".
42
+ # '
43
+ # ' Regarding the choice between scaling methods A and B, Carle suggests that
44
+ # ' "analysts who wish to discuss point estimates should report results based
45
+ # ' on weighting method A. For analysts more interested in residual
46
+ # ' between-group variance, method B may generally provide the least biased
47
+ # ' estimates". In general, it is recommended to fit a non-weighted model and
48
+ # ' weighted models with both scaling methods and when comparing the models,
49
+ # ' see whether the "inferential decisions converge", to gain confidence in the
50
+ # ' results.
51
+ # '
52
+ # ' Though the bias of scaled weights decreases with increasing group size,
53
+ # ' method A is preferred when insufficient or low group size is a concern.
54
+ # '
55
+ # ' The group ID and probably PSU may be used as random effects (e.g. nested
56
+ # ' design, or group and PSU as varying intercepts), depending on the survey
57
+ # ' design that should be mimicked.
58
+ # '
59
+ # ' - `method = "kish"`
60
+ # '
61
+ # ' to do...
54
62
# '
55
63
# ' @references
64
+ # ' - Asparouhov T. (2006). General Multi-Level Modeling with Sampling
65
+ # ' Weights. Communications in Statistics - Theory and Methods 35: 439-460
66
+ # '
56
67
# ' - Carle A.C. (2009). Fitting multilevel models in complex survey data
57
68
# ' with design weights: Recommendations. BMC Medical Research Methodology
58
69
# ' 9(49): 1-13
59
70
# '
60
- # ' - Asparouhov T. (2006). General Multi-Level Modeling with Sampling
61
- # ' Weights. Communications in Statistics - Theory and Methods 35: 439-460
71
+ # ' - Kish ...
62
72
# '
63
73
# ' @examples
64
74
# ' if (require("lme4")) {
87
97
# ' )
88
98
# ' }
89
99
# ' @export
90
- rescale_weights <- function (data , by , probability_weights , nest = FALSE ) {
100
+ rescale_weights <- function (data , by , probability_weights , nest = FALSE , method = " carle " ) {
91
101
if (inherits(by , " formula" )) {
92
102
by <- all.vars(by )
93
103
}
@@ -107,6 +117,32 @@ rescale_weights <- function(data, by, probability_weights, nest = FALSE) {
107
117
# sort id
108
118
data_tmp $ .bamboozled <- seq_len(nrow(data_tmp ))
109
119
120
+ switch (method ,
121
+ carle = .rescale_weights_carle(nest , probability_weights , data_tmp , data , by , weight_non_na ),
122
+ .rescale_weights_kish(probability_weights , data_tmp , data , weight_non_na )
123
+ )
124
+ }
125
+
126
+
127
+ # rescale weights, method Carle ----------------------------
128
+
129
+ .rescale_weights_kish <- function (probability_weights , data_tmp , data , weight_non_na ) {
130
+ weights <- mean(data_tmp [[probability_weights ]])
131
+ # design effect according to Kish
132
+ deff <- mean(weights ^ 2 ) / (mean(weights )^ 2 )
133
+ # rescale weights, so their mean is 1
134
+ z_weights <- ((weights + 1 ) - mean(weights ) ) / stats :: sd(weights )
135
+ # divide weights by design effect
136
+ data $ pweight <- NA_real_
137
+ data $ pweight [weight_non_na ] <- z_weights / deff
138
+ # return result
139
+ data
140
+ }
141
+
142
+
143
+ # rescale weights, method Carle ----------------------------
144
+
145
+ .rescale_weights_carle <- function (nest , probability_weights , data_tmp , data , by , weight_non_na ) {
110
146
if (nest && length(by ) < 2 ) {
111
147
insight :: format_warning(
112
148
sprintf(
0 commit comments