Skip to content

Commit

Permalink
Merge pull request #38 from rapidsurveys:dev
Browse files Browse the repository at this point in the history
update vignettes; fix #3
  • Loading branch information
ernestguevarra authored Jan 6, 2025
2 parents b8378bd + 6ff4c9b commit 1ed6b13
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 26 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: bbw
Type: Package
Title: Blocked Weighted Bootstrap
Version: 0.2.2.9000
Version: 0.2.3.9000
Authors@R: c(
person("Mark", "Myatt",
email = "mark@brixtonhealth.com", role = c("aut", "cph")),
Expand All @@ -19,7 +19,7 @@ Description: The blocked weighted bootstrap (BBW) is an estimation technique
<doi:10.1371/journal.pone.0162462> for application of the blocked weighted
bootstrap to estimate indicators from two-stage cluster sampled surveys.
License: GPL-3
Depends: R (>= 3.0.1)
Depends: R (>= 4.1.0)
Imports:
car,
cli,
Expand Down
34 changes: 10 additions & 24 deletions vignettes/bbw.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Blocked Weighted Bootstrap"
author: "Mark Myatt and Ernest Guevarra"
date: "`r Sys.Date()`"
date: "6 January 2025"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Blocked Weighted Bootstrap}
Expand All @@ -18,19 +18,13 @@ knitr::opts_chunk$set(

The **blocked weighted bootstrap** is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. *population-proportional sampling* or *PPS* as used in **Standardized Monitoring and Assessment of Relief and Transitions** or **SMART** surveys) or *posterior weighting* (e.g. as used in **Rapid Assessment Method** or **RAM** and **Simple Spatial Sampling Method** or **S3M** surveys).

<br/>
The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The blocked weighted bootstrap used in RAM and S3M is a modification to the *percentile bootstrap* to include *blocking* and *weighting* to account for a *complex sample design*.

The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The **blocked weighted bootstrap** used in **RAM** and **S3M** is a modification to the *percentile bootstrap* to include *blocking* and *weighing* to account for a *complex sample design*.
With RAM and S3M surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A blocked weighted bootstrap can be used:

<br/>
* **Blocked**: The block corresponds to the primary sampling unit (`PSU = cluster`). *PSU*s are resampled with replacement. Observations within the resampled PSUs are also sampled with replacement.

With **RAM** and **S3M** surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A *blocked weighted bootstrap* can be used:

<br/>

**Blocked**: The block corresponds to the primary sampling unit ($PSU = cluster$). *PSU*s are resampled with replacement. Observations within the resampled *PSU*s are also sampled with replacement.

**Weighted**: **RAM** and **S3M** samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `bbw` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of *PSU*s in bootstrap replicates.
* **Weighted**: RAM and S3M samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `{bbw}` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of PSUs in bootstrap replicates.

<br/>

Expand All @@ -42,28 +36,20 @@ With **RAM** and **S3M** surveys, the sample is complex in the sense that it is

<br/>

In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in **RAM** or **S3M**) the weight is the population of each *PSU*. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*.

<br/>

A total of $m$) *PSU*s are sampled with replacement for each bootstrap replicate (where $m$) is the number of *PSU*s in the survey sample).
In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in RAM or S3M) the weight is the population of each PSU. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*.

<br/>
A total of `m` PSUs are sampled with replacement for each bootstrap replicate (where `m` is the number of PSUs in the survey sample).

The required statistic is applied to each replicate. The reported estimate consists of the 0.025th (*95\% LCL*), 0.5th (*point estimate*), and 0.975th (*95\% UCL*) quantiles of the distribution of the statistic across all survey replicates.

<br/>

Early versions of the `bbw` did not resample observations within *PSU*s following:
Early versions of the `{bbw}` did not resample observations within PSUs following:

<br/>

> Cameron AC, Gelbach JB, Miller DL, Bootstrap-based improvements for inference with clustered errors, Review of Economics and Statistics, 2008:90;414–427 <https://doi.org/10.1162/rest.90.3.414>
<br/>

and used a large number (e.g. $3999$) survey replicates. Current versions of the `bbw` resample observations within *PSU*s and use a smaller number of survey replicates (e.g. $n = 400$). This is a more computationally efficient approach.

<br/>
and used a large number (e.g. `3999`) survey replicates. Current versions of the `{bbw}` resample observations within PSUs and use a smaller number of survey replicates (e.g. `n = 400`). This is a more computationally efficient approach.

The main reason to use `bbw` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `bbw` also allows for a wider range of hypothesis tests to be used with complex sample survey data.
The main reason to use `{bbw}` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `{bbw}` also allows for a wider range of hypothesis tests to be used with complex sample survey data.

0 comments on commit 1ed6b13

Please sign in to comment.