Skip to content

Commit

Permalink
Reading C10 Data from private OSF (#73)
Browse files Browse the repository at this point in the history
* Move LAPOP data to private OSF directory

* Read ambarom from private OSF
  • Loading branch information
szimmer authored Aug 25, 2023
1 parent 8353102 commit 0071544
Show file tree
Hide file tree
Showing 4 changed files with 115 additions and 65 deletions.
1 change: 1 addition & 0 deletions .github/workflows/deploy_bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ jobs:
- name: Render Book
env:
CENSUS_KEY: ${{ secrets.CENSUS_KEY }}
OSF_PAT: ${{ secrets.OSF_PAT }}
run: Rscript -e 'bookdown::render_book("index.Rmd")'
- uses: actions/upload-artifact@v1
with:
Expand Down
55 changes: 47 additions & 8 deletions 10-ambarom-vignette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,36 @@ library(rnaturalearth) # Getting world maps
library(rnaturalearthdata)
library(gt)
library(ggpattern)
library(osfr)
source("helper-fun/helper-functions.R")
```

We will be using data from the AmericasBarometer surveys. Here is the code to read in the dataset that we will be working with:
```{r}
#| label: ambarom-read
#| message: false
#| cache: TRUE
ambarom_in <- read_osf("lapop_2021.rds")
We are unable to host this data like other data we have hosted. Each country and each year has its own files. The data used in this vignette can be downloaded from the LAPOP website. In this vignette, we will be using data from 2021, namely version v1.2. These are not available on the book's repository, but you may download the raw files yourself^[http://datasets.americasbarometer.org/database/index.php] (@lapopdat). To read all files into R and ignore the Stata labels, we recommend running code like this:

```r
stata_files <- list.files(here("RawData", "LAPOP_2021"), "*.dta")

read_stata_unlabeled <- function(file) {
read_stata(file) %>%
zap_labels() %>%
zap_label()
}

ambarom_full_in <- here("RawData", "LAPOP_2021", stata_files) %>%
map_df(read_stata_unlabeled)
```

The code above will read all files of type `dta` in and stack them into one tibble. We did this and then selected a subset of variables for this vignette (see code below to recreate). To understand variables that are used across the several countries, the core questionnaire is useful.^[https://www.vanderbilt.edu/lapop/ab2021/AB2021-Core-Questionnaire-v17.5-Eng-210514-W-v2.pdf]

```r
ambarom_in <- ambarom_full_in %>%
select(pais, strata, upm, weight1500, strata, core_a_core_b,
q2, q1tb, covid2at, a4, idio2, idio2cov, it1, jc13,
m1, mil10a, mil10e, ccch1, ccch3, ccus1, ccus3,
edr, ocup4a, q14, q11n, q12c, q12bn,
starts_with("covidedu1"), gi0n,
r15, r18n, r18
)
```

:::

## Introduction
Expand Down Expand Up @@ -63,6 +82,26 @@ The code above will read all files of type `dta` in and stack them into one tibb
Many of the variables are coded as numeric and do not have intuitive variable names, so the next step is to create derived variables and analysis-ready data. Using the core questionnaire as a codebook, derived variables are created below with relevant factors with informative names.


```{r}
#| label: ambarom-read-secret
#| include: FALSE
library(osfr)
lapop_rds_files <- osf_retrieve_node("https://osf.io/z5c3m/") %>%
osf_ls_files(path="LAPOP_2021", n_max=40, pattern=".rds")
filedet <- lapop_rds_files %>%
osf_download(conflicts="overwrite", path=here::here("osf_dl"))
ambarom_in <- filedet %>%
pull(local_path) %>%
read_rds()
unlink(pull(filedet, "local_path"))
```


```{r}
#| label: ambarom-derive
ambarom <- ambarom_in %>%
Expand Down
10 changes: 7 additions & 3 deletions DataCleaningScripts/LAPOP_2021_DataPrep.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,17 @@ lapop <- lapop_in %>%
summary(lapop)
lapop_temp_loc <- here("osf_dl", "lapop_2021.rds")
dir.create(here("osf_dl", "LAPOP_2021"))
lapop_temp_loc <- here("osf_dl", "LAPOP_2021", "lapop_2021.rds")
write_rds(lapop, lapop_temp_loc)
target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957")
# target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957")
target_dir <- osf_retrieve_node("https://osf.io/z5c3m/")
osf_upload(target_dir, path=lapop_temp_loc, conflicts="overwrite")
osf_upload(target_dir, path=here("osf_dl", "LAPOP_2021"), conflicts="overwrite")
unlink(lapop_temp_loc)
```
Expand Down
114 changes: 60 additions & 54 deletions DataCleaningScripts/LAPOP_2021_DataPrep.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,69 +61,75 @@ lapop <- lapop_in %>%
summary(lapop)
```

## pais strata upm weight1500 core_a_core_b
## Min. : 1.00 Min. :1.000e+08 Min. :1.001e+07 Min. :0.004136 Length:64352
## 1st Qu.: 6.00 1st Qu.:6.000e+08 1st Qu.:6.153e+07 1st Qu.:0.251556 Class :character
## Median :11.00 Median :1.100e+09 Median :1.202e+08 Median :0.417251 Mode :character
## Mean :13.03 Mean :1.303e+09 Mean :1.666e+08 Mean :0.512805
## 3rd Qu.:17.00 3rd Qu.:1.700e+09 3rd Qu.:2.105e+08 3rd Qu.:0.674477
## Max. :41.00 Max. :4.100e+09 Max. :1.135e+09 Max. :7.024495
##
## q2 q1tb covid2at a4 idio2 idio2cov
## Min. : 16.00 Min. :1.000 Min. :1.000 Min. : 1.00 Min. :1.000 Min. :1.000
## 1st Qu.: 27.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 3.00 1st Qu.:2.000 1st Qu.:1.000
## Median : 36.00 Median :2.000 Median :2.000 Median : 22.00 Median :3.000 Median :1.000
## Mean : 38.86 Mean :1.521 Mean :2.076 Mean : 36.73 Mean :2.439 Mean :1.242
## 3rd Qu.: 49.00 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.: 71.00 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :121.00 Max. :3.000 Max. :4.000 Max. :865.00 Max. :3.000 Max. :2.000
## NA's :90 NA's :90 NA's :6686 NA's :4965 NA's :2766 NA's :31580
## it1 jc13 m1 mil10a mil10e ccch1
## Min. :1.000 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00
## 1st Qu.:2.000 1st Qu.:1.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:1.00
## Median :2.000 Median :2.00 Median :3.00 Median :3.00 Median :2.00 Median :1.00
## Mean :2.275 Mean :1.62 Mean :2.98 Mean :2.72 Mean :2.39 Mean :1.78
## 3rd Qu.:3.000 3rd Qu.:2.00 3rd Qu.:4.00 3rd Qu.:3.00 3rd Qu.:3.00 3rd Qu.:2.00
## Max. :4.000 Max. :2.00 Max. :5.00 Max. :4.00 Max. :4.00 Max. :4.00
## NA's :3631 NA's :50827 NA's :33238 NA's :49939 NA's :44021 NA's :50535
## ccch3 ccus1 ccus3 edr ocup4a q14
## Min. :1.00 Min. :1.00 Min. :1.00 Min. :0.000 Min. :1.000 Min. :1.0
## 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.0
## Median :2.00 Median :1.00 Median :2.00 Median :2.000 Median :1.000 Median :2.0
## Mean :1.82 Mean :1.58 Mean :1.76 Mean :2.192 Mean :2.627 Mean :1.6
## 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:2.0
## Max. :3.00 Max. :4.00 Max. :3.00 Max. :3.000 Max. :7.000 Max. :2.0
## NA's :51961 NA's :50028 NA's :51226 NA's :4114 NA's :29505 NA's :44130
## q11n q12c q12bn covidedu1_1 covidedu1_2 covidedu1_3
## Min. :1.000 Min. : 1.000 Min. : 0.000 Min. :0.00 Min. :0.00 Min. :0.00
## 1st Qu.:1.000 1st Qu.: 3.000 1st Qu.: 0.000 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00
## Median :2.000 Median : 4.000 Median : 1.000 Median :0.00 Median :0.00 Median :1.00
## Mean :2.214 Mean : 4.036 Mean : 1.001 Mean :0.17 Mean :0.07 Mean :0.62
## 3rd Qu.:3.000 3rd Qu.: 5.000 3rd Qu.: 2.000 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:1.00
## Max. :7.000 Max. :20.000 Max. :16.000 Max. :1.00 Max. :1.00 Max. :1.00
## NA's :31198 NA's :29144 NA's :29449 NA's :51297 NA's :51297 NA's :51297
## covidedu1_4 covidedu1_5 gi0n r15 r18n r18
## Min. :0.00 Min. :0.00 Min. :1.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000
## Median :0.00 Median :0.00 Median :1.000 Median :1.000 Median :1.000 Median :1.000
## Mean :0.12 Mean :0.08 Mean :1.646 Mean :0.513 Mean :0.537 Mean :0.815
## 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.00 Max. :1.00 Max. :5.000 Max. :1.000 Max. :1.000 Max. :1.000
## NA's :51297 NA's :51297 NA's :1240 NA's :4118 NA's :4386 NA's :4249
## pais strata upm weight1500 core_a_core_b q2 q1tb covid2at
## Min. : 1.00 Min. :1.000e+08 Min. :1.001e+07 Min. :0.004136 Length:64352 Min. : 16.00 Min. :1.000 Min. :1.000
## 1st Qu.: 6.00 1st Qu.:6.000e+08 1st Qu.:6.153e+07 1st Qu.:0.251556 Class :character 1st Qu.: 27.00 1st Qu.:1.000 1st Qu.:1.000
## Median :11.00 Median :1.100e+09 Median :1.202e+08 Median :0.417251 Mode :character Median : 36.00 Median :2.000 Median :2.000
## Mean :13.03 Mean :1.303e+09 Mean :1.666e+08 Mean :0.512805 Mean : 38.86 Mean :1.521 Mean :2.076
## 3rd Qu.:17.00 3rd Qu.:1.700e+09 3rd Qu.:2.105e+08 3rd Qu.:0.674477 3rd Qu.: 49.00 3rd Qu.:2.000 3rd Qu.:3.000
## Max. :41.00 Max. :4.100e+09 Max. :1.135e+09 Max. :7.024495 Max. :121.00 Max. :3.000 Max. :4.000
## NA's :90 NA's :90 NA's :6686
## a4 idio2 idio2cov it1 jc13 m1 mil10a mil10e ccch1
## Min. : 1.00 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00
## 1st Qu.: 3.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:1.00
## Median : 22.00 Median :3.000 Median :1.000 Median :2.000 Median :2.00 Median :3.00 Median :3.00 Median :2.00 Median :1.00
## Mean : 36.73 Mean :2.439 Mean :1.242 Mean :2.275 Mean :1.62 Mean :2.98 Mean :2.72 Mean :2.39 Mean :1.78
## 3rd Qu.: 71.00 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:2.00 3rd Qu.:4.00 3rd Qu.:3.00 3rd Qu.:3.00 3rd Qu.:2.00
## Max. :865.00 Max. :3.000 Max. :2.000 Max. :4.000 Max. :2.00 Max. :5.00 Max. :4.00 Max. :4.00 Max. :4.00
## NA's :4965 NA's :2766 NA's :31580 NA's :3631 NA's :50827 NA's :33238 NA's :49939 NA's :44021 NA's :50535
## ccch3 ccus1 ccus3 edr ocup4a q14 q11n q12c q12bn
## Min. :1.00 Min. :1.00 Min. :1.00 Min. :0.000 Min. :1.000 Min. :1.0 Min. :1.000 Min. : 1.000 Min. : 0.000
## 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.0 1st Qu.:1.000 1st Qu.: 3.000 1st Qu.: 0.000
## Median :2.00 Median :1.00 Median :2.00 Median :2.000 Median :1.000 Median :2.0 Median :2.000 Median : 4.000 Median : 1.000
## Mean :1.82 Mean :1.58 Mean :1.76 Mean :2.192 Mean :2.627 Mean :1.6 Mean :2.214 Mean : 4.036 Mean : 1.001
## 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:2.0 3rd Qu.:3.000 3rd Qu.: 5.000 3rd Qu.: 2.000
## Max. :3.00 Max. :4.00 Max. :3.00 Max. :3.000 Max. :7.000 Max. :2.0 Max. :7.000 Max. :20.000 Max. :16.000
## NA's :51961 NA's :50028 NA's :51226 NA's :4114 NA's :29505 NA's :44130 NA's :31198 NA's :29144 NA's :29449
## covidedu1_1 covidedu1_2 covidedu1_3 covidedu1_4 covidedu1_5 gi0n r15 r18n r18
## Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :1.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:1.000
## Median :0.00 Median :0.00 Median :1.00 Median :0.00 Median :0.00 Median :1.000 Median :1.000 Median :1.000 Median :1.000
## Mean :0.17 Mean :0.07 Mean :0.62 Mean :0.12 Mean :0.08 Mean :1.646 Mean :0.513 Mean :0.537 Mean :0.815
## 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:1.00 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :5.000 Max. :1.000 Max. :1.000 Max. :1.000
## NA's :51297 NA's :51297 NA's :51297 NA's :51297 NA's :51297 NA's :1240 NA's :4118 NA's :4386 NA's :4249

``` r
lapop_temp_loc <- here("osf_dl", "lapop_2021.rds")
dir.create(here("osf_dl", "LAPOP_2021"))
```

## Warning in dir.create(here("osf_dl", "LAPOP_2021")): 'C:\Users\steph\Documents\GitHub\tidy-survey-book\osf_dl\LAPOP_2021' already exists

``` r
lapop_temp_loc <- here("osf_dl", "LAPOP_2021", "lapop_2021.rds")

write_rds(lapop, lapop_temp_loc)

target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957")
# target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957")

target_dir <- osf_retrieve_node("https://osf.io/z5c3m/")

osf_upload(target_dir, path=lapop_temp_loc, conflicts="overwrite")
osf_upload(target_dir, path=here("osf_dl", "LAPOP_2021"), conflicts="overwrite")
```

## Searching for conflicting files on OSF

## Retrieving 24 of 24 available items:

## ..retrieved 10 items

## ..retrieved 20 items

## ..retrieved 24 items

## ..done

## Updating 1 existing file(s) on OSF

## # A tibble: 1 × 3
## name id meta
## <chr> <chr> <list>
## 1 lapop_2021.rds 647cddbbbf3d0f09ccd873b8 <named list [3]>
## name id meta
## <chr> <chr> <list>
## 1 LAPOP_2021 647ce3443c3a380884a04379 <named list [3]>

``` r
unlink(lapop_temp_loc)
Expand Down

0 comments on commit 0071544

Please sign in to comment.