Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update vignettes; fix #3 #38

Merged
merged 1 commit into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: bbw
Type: Package
Title: Blocked Weighted Bootstrap
Version: 0.2.2.9000
Version: 0.2.3.9000
Authors@R: c(
person("Mark", "Myatt",
email = "mark@brixtonhealth.com", role = c("aut", "cph")),
Expand All @@ -19,7 +19,7 @@ Description: The blocked weighted bootstrap (BBW) is an estimation technique
<doi:10.1371/journal.pone.0162462> for application of the blocked weighted
bootstrap to estimate indicators from two-stage cluster sampled surveys.
License: GPL-3
Depends: R (>= 3.0.1)
Depends: R (>= 4.1.0)
Imports:
car,
cli,
Expand Down
34 changes: 10 additions & 24 deletions vignettes/bbw.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Blocked Weighted Bootstrap"
author: "Mark Myatt and Ernest Guevarra"
date: "`r Sys.Date()`"
date: "6 January 2025"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Blocked Weighted Bootstrap}
Expand All @@ -18,19 +18,13 @@ knitr::opts_chunk$set(

The **blocked weighted bootstrap** is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. *population-proportional sampling* or *PPS* as used in **Standardized Monitoring and Assessment of Relief and Transitions** or **SMART** surveys) or *posterior weighting* (e.g. as used in **Rapid Assessment Method** or **RAM** and **Simple Spatial Sampling Method** or **S3M** surveys).

<br/>
The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The blocked weighted bootstrap used in RAM and S3M is a modification to the *percentile bootstrap* to include *blocking* and *weighting* to account for a *complex sample design*.

The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The **blocked weighted bootstrap** used in **RAM** and **S3M** is a modification to the *percentile bootstrap* to include *blocking* and *weighing* to account for a *complex sample design*.
With RAM and S3M surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A blocked weighted bootstrap can be used:

<br/>
* **Blocked**: The block corresponds to the primary sampling unit (`PSU = cluster`). *PSU*s are resampled with replacement. Observations within the resampled PSUs are also sampled with replacement.

With **RAM** and **S3M** surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A *blocked weighted bootstrap* can be used:

<br/>

**Blocked**: The block corresponds to the primary sampling unit ($PSU = cluster$). *PSU*s are resampled with replacement. Observations within the resampled *PSU*s are also sampled with replacement.

**Weighted**: **RAM** and **S3M** samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `bbw` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of *PSU*s in bootstrap replicates.
* **Weighted**: RAM and S3M samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `{bbw}` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of PSUs in bootstrap replicates.

<br/>

Expand All @@ -42,28 +36,20 @@ With **RAM** and **S3M** surveys, the sample is complex in the sense that it is

<br/>

In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in **RAM** or **S3M**) the weight is the population of each *PSU*. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*.

<br/>

A total of $m$) *PSU*s are sampled with replacement for each bootstrap replicate (where $m$) is the number of *PSU*s in the survey sample).
In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in RAM or S3M) the weight is the population of each PSU. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*.

<br/>
A total of `m` PSUs are sampled with replacement for each bootstrap replicate (where `m` is the number of PSUs in the survey sample).

The required statistic is applied to each replicate. The reported estimate consists of the 0.025th (*95\% LCL*), 0.5th (*point estimate*), and 0.975th (*95\% UCL*) quantiles of the distribution of the statistic across all survey replicates.

<br/>

Early versions of the `bbw` did not resample observations within *PSU*s following:
Early versions of the `{bbw}` did not resample observations within PSUs following:

<br/>

> Cameron AC, Gelbach JB, Miller DL, Bootstrap-based improvements for inference with clustered errors, Review of Economics and Statistics, 2008:90;414–427 <https://doi.org/10.1162/rest.90.3.414>

<br/>

and used a large number (e.g. $3999$) survey replicates. Current versions of the `bbw` resample observations within *PSU*s and use a smaller number of survey replicates (e.g. $n = 400$). This is a more computationally efficient approach.

<br/>
and used a large number (e.g. `3999`) survey replicates. Current versions of the `{bbw}` resample observations within PSUs and use a smaller number of survey replicates (e.g. `n = 400`). This is a more computationally efficient approach.

The main reason to use `bbw` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `bbw` also allows for a wider range of hypothesis tests to be used with complex sample survey data.
The main reason to use `{bbw}` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `{bbw}` also allows for a wider range of hypothesis tests to be used with complex sample survey data.
Loading