diff --git a/DESCRIPTION b/DESCRIPTION index ca8ecbd..160c946 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: bbw Type: Package Title: Blocked Weighted Bootstrap -Version: 0.2.2.9000 +Version: 0.2.3.9000 Authors@R: c( person("Mark", "Myatt", email = "mark@brixtonhealth.com", role = c("aut", "cph")), @@ -19,7 +19,7 @@ Description: The blocked weighted bootstrap (BBW) is an estimation technique for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys. License: GPL-3 -Depends: R (>= 3.0.1) +Depends: R (>= 4.1.0) Imports: car, cli, diff --git a/vignettes/bbw.Rmd b/vignettes/bbw.Rmd index f463c2c..cbcef2c 100644 --- a/vignettes/bbw.Rmd +++ b/vignettes/bbw.Rmd @@ -1,7 +1,7 @@ --- title: "Blocked Weighted Bootstrap" author: "Mark Myatt and Ernest Guevarra" -date: "`r Sys.Date()`" +date: "6 January 2025" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Blocked Weighted Bootstrap} @@ -18,19 +18,13 @@ knitr::opts_chunk$set( The **blocked weighted bootstrap** is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. *population-proportional sampling* or *PPS* as used in **Standardized Monitoring and Assessment of Relief and Transitions** or **SMART** surveys) or *posterior weighting* (e.g. as used in **Rapid Assessment Method** or **RAM** and **Simple Spatial Sampling Method** or **S3M** surveys). -
+The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The blocked weighted bootstrap used in RAM and S3M is a modification to the *percentile bootstrap* to include *blocking* and *weighting* to account for a *complex sample design*. -The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The **blocked weighted bootstrap** used in **RAM** and **S3M** is a modification to the *percentile bootstrap* to include *blocking* and *weighing* to account for a *complex sample design*. +With RAM and S3M surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A blocked weighted bootstrap can be used: -
+* **Blocked**: The block corresponds to the primary sampling unit (`PSU = cluster`). *PSU*s are resampled with replacement. Observations within the resampled PSUs are also sampled with replacement. -With **RAM** and **S3M** surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A *blocked weighted bootstrap* can be used: - -
- -**Blocked**: The block corresponds to the primary sampling unit ($PSU = cluster$). *PSU*s are resampled with replacement. Observations within the resampled *PSU*s are also sampled with replacement. - -**Weighted**: **RAM** and **S3M** samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `bbw` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of *PSU*s in bootstrap replicates. +* **Weighted**: RAM and S3M samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `{bbw}` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of PSUs in bootstrap replicates.
@@ -42,19 +36,13 @@ With **RAM** and **S3M** surveys, the sample is complex in the sense that it is
-In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in **RAM** or **S3M**) the weight is the population of each *PSU*. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*. - -
- -A total of $m$) *PSU*s are sampled with replacement for each bootstrap replicate (where $m$) is the number of *PSU*s in the survey sample). +In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in RAM or S3M) the weight is the population of each PSU. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*. -
+A total of `m` PSUs are sampled with replacement for each bootstrap replicate (where `m` is the number of PSUs in the survey sample). The required statistic is applied to each replicate. The reported estimate consists of the 0.025th (*95\% LCL*), 0.5th (*point estimate*), and 0.975th (*95\% UCL*) quantiles of the distribution of the statistic across all survey replicates. -
- -Early versions of the `bbw` did not resample observations within *PSU*s following: +Early versions of the `{bbw}` did not resample observations within PSUs following:
@@ -62,8 +50,6 @@ Early versions of the `bbw` did not resample observations within *PSU*s followin
-and used a large number (e.g. $3999$) survey replicates. Current versions of the `bbw` resample observations within *PSU*s and use a smaller number of survey replicates (e.g. $n = 400$). This is a more computationally efficient approach. - -
+and used a large number (e.g. `3999`) survey replicates. Current versions of the `{bbw}` resample observations within PSUs and use a smaller number of survey replicates (e.g. `n = 400`). This is a more computationally efficient approach. -The main reason to use `bbw` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `bbw` also allows for a wider range of hypothesis tests to be used with complex sample survey data. +The main reason to use `{bbw}` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `{bbw}` also allows for a wider range of hypothesis tests to be used with complex sample survey data.