Merge pull request #38 from rapidsurveys:dev

update vignettes; fix #3
rapidsurveys · Jan 6, 2025 · 1ed6b13 · 1ed6b13
2 parents b8378bd + 6ff4c9b
commit 1ed6b13
Show file tree

Hide file tree

Showing 2 changed files with 12 additions and 26 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: bbw
 Type: Package
 Title: Blocked Weighted Bootstrap
-Version: 0.2.2.9000
+Version: 0.2.3.9000
 Authors@R: c(
     person("Mark", "Myatt", 
            email = "mark@brixtonhealth.com", role = c("aut", "cph")),
@@ -19,7 +19,7 @@ Description: The blocked weighted bootstrap (BBW) is an estimation technique
     <doi:10.1371/journal.pone.0162462> for application of the blocked weighted 
     bootstrap to estimate indicators from two-stage cluster sampled surveys.
 License: GPL-3
-Depends: R (>= 3.0.1)
+Depends: R (>= 4.1.0)
 Imports: 
     car,
     cli,

diff --git a/vignettes/bbw.Rmd b/vignettes/bbw.Rmd
@@ -1,7 +1,7 @@
 ---
 title: "Blocked Weighted Bootstrap"
 author: "Mark Myatt and Ernest Guevarra"
-date: "`r Sys.Date()`"
+date: "6 January 2025"
 output: rmarkdown::html_vignette
 vignette: >
   %\VignetteIndexEntry{Blocked Weighted Bootstrap}
@@ -18,19 +18,13 @@ knitr::opts_chunk$set(
 
 The **blocked weighted bootstrap** is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. *population-proportional sampling* or *PPS* as used in **Standardized Monitoring and Assessment of Relief and Transitions** or **SMART** surveys) or *posterior weighting* (e.g. as used in **Rapid Assessment Method** or **RAM** and **Simple Spatial Sampling Method** or **S3M** surveys).
 
-<br/>
+The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The blocked weighted bootstrap used in RAM and S3M is a modification to the *percentile bootstrap* to include *blocking* and *weighting* to account for a *complex sample design*.
 
-The bootstrap technique is described in this [article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)). The **blocked weighted bootstrap** used in **RAM** and **S3M** is a modification to the *percentile bootstrap* to include *blocking* and *weighing* to account for a *complex sample design*.
+With RAM and S3M surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A blocked weighted bootstrap can be used:
 
-<br/>
+* **Blocked**: The block corresponds to the primary sampling unit (`PSU = cluster`). *PSU*s are resampled with replacement. Observations within the resampled PSUs are also sampled with replacement.
 
-With **RAM** and **S3M** surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A *blocked weighted bootstrap* can be used:
-
-<br/>
-
-**Blocked**: The block corresponds to the primary sampling unit ($PSU = cluster$). *PSU*s are resampled with replacement. Observations within the resampled *PSU*s are also sampled with replacement.
-
-**Weighted**: **RAM** and **S3M** samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `bbw` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of *PSU*s in bootstrap replicates.
+* **Weighted**: RAM and S3M samples do not use *population proportional sampling (PPS)* to weight the sample prior to data collection (e.g. as is done with **SMART** surveys). This means that a posterior weighting procedure is required. `{bbw}` uses a *"roulette wheel"* algorithm (see [Figure 1](#FIG1) below) to weight (i.e. by population) the selection probability of PSUs in bootstrap replicates.
 
 <br/>
 
@@ -42,28 +36,20 @@ With **RAM** and **S3M** surveys, the sample is complex in the sense that it is
 
 <br/>
 
-In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in **RAM** or **S3M**) the weight is the population of each *PSU*. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*.
-
-<br/>
-
-A total of $m$) *PSU*s are sampled with replacement for each bootstrap replicate (where $m$) is the number of *PSU*s in the survey sample).
+In the case of prior weighting by *PPS* all clusters are given the same weight. With posterior weighting (as in RAM or S3M) the weight is the population of each PSU. This procedure is very similar to the [fitness proportional selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) technique used in *evolutionary computing*.
 
-<br/>
+A total of `m` PSUs are sampled with replacement for each bootstrap replicate (where `m` is the number of PSUs in the survey sample).
 
 The required statistic is applied to each replicate. The reported estimate consists of the 0.025th (*95\% LCL*), 0.5th (*point estimate*), and 0.975th (*95\% UCL*) quantiles of the distribution of the statistic across all survey replicates.
 
-<br/>
-
-Early versions of the `bbw` did not resample observations within *PSU*s following:
+Early versions of the `{bbw}` did not resample observations within PSUs following:
 
 <br/>
 
 > Cameron AC, Gelbach JB, Miller DL, Bootstrap-based improvements for inference with clustered errors, Review of Economics and Statistics, 2008:90;414–427 <https://doi.org/10.1162/rest.90.3.414>
 
 <br/>
 
-and used a large number (e.g. $3999$) survey replicates. Current versions of the `bbw` resample observations within *PSU*s and use a smaller number of survey replicates (e.g. $n = 400$). This is a more computationally efficient approach.
-
-<br/>
+and used a large number (e.g. `3999`) survey replicates. Current versions of the `{bbw}` resample observations within PSUs and use a smaller number of survey replicates (e.g. `n = 400`). This is a more computationally efficient approach.
 
-The main reason to use `bbw` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `bbw` also allows for a wider range of hypothesis tests to be used with complex sample survey data.
+The main reason to use `{bbw}` is that the bootstrap allows a wider range statistics to be calculated than model-based techniques without resort to grand assumptions about the sampling distribution of the required statistic. A good example for this is the confidence interval on the difference between two medians which might be used for many socio-economic variables. The `{bbw}` also allows for a wider range of hypothesis tests to be used with complex sample survey data.