Reading C10 Data from private OSF (#73)

* Move LAPOP data to private OSF directory * Read ambarom from private OSF
tidy-survey-r · Aug 25, 2023 · 0071544 · 0071544
1 parent 8353102
commit 0071544
Show file tree

Hide file tree

Showing 4 changed files with 115 additions and 65 deletions.
diff --git a/.github/workflows/deploy_bookdown.yml b/.github/workflows/deploy_bookdown.yml
@@ -41,6 +41,7 @@ jobs:
       - name: Render Book
         env:
           CENSUS_KEY: ${{ secrets.CENSUS_KEY }}
+          OSF_PAT: ${{ secrets.OSF_PAT }}
         run: Rscript -e 'bookdown::render_book("index.Rmd")'
       - uses: actions/upload-artifact@v1
         with:

diff --git a/10-ambarom-vignette.Rmd b/10-ambarom-vignette.Rmd
@@ -18,17 +18,36 @@ library(rnaturalearth) # Getting world maps
 library(rnaturalearthdata)
 library(gt)
 library(ggpattern)
-library(osfr)
-source("helper-fun/helper-functions.R")
 ```
 
-We will be using data from the AmericasBarometer surveys. Here is the code to read in the dataset that we will be working with:
-```{r}
-#| label: ambarom-read
-#| message: false
-#| cache: TRUE
-ambarom_in <- read_osf("lapop_2021.rds")
+We are unable to host this data like other data we have hosted. Each country and each year has its own files. The data used in this vignette can be downloaded from the LAPOP website.  In this vignette, we will be using data from 2021, namely version v1.2. These are not available on the book's repository, but you may download the raw files yourself^[http://datasets.americasbarometer.org/database/index.php] (@lapopdat). To read all files into R and ignore the Stata labels, we recommend running code like this:
+
+```r
+stata_files <- list.files(here("RawData", "LAPOP_2021"), "*.dta")
+
+read_stata_unlabeled <- function(file) {
+  read_stata(file) %>%
+    zap_labels() %>%
+    zap_label()
+}
+
+ambarom_full_in <- here("RawData", "LAPOP_2021", stata_files) %>%
+  map_df(read_stata_unlabeled)
+```
+
+The code above will read all files of type `dta` in and stack them into one tibble. We did this and then selected a subset of variables for this vignette (see code below to recreate). To understand variables that are used across the several countries, the core questionnaire is useful.^[https://www.vanderbilt.edu/lapop/ab2021/AB2021-Core-Questionnaire-v17.5-Eng-210514-W-v2.pdf] 
+
+```r
+ambarom_in <- ambarom_full_in %>%
+  select(pais, strata, upm, weight1500, strata, core_a_core_b,
+         q2, q1tb, covid2at, a4, idio2, idio2cov, it1, jc13,
+         m1, mil10a, mil10e, ccch1, ccch3, ccus1, ccus3,
+         edr, ocup4a, q14, q11n, q12c, q12bn,
+         starts_with("covidedu1"), gi0n,
+         r15, r18n, r18
+         ) 
 ```
+
 :::
 
 ## Introduction
@@ -63,6 +82,26 @@ The code above will read all files of type `dta` in and stack them into one tibb
 Many of the variables are coded as numeric and do not have intuitive variable names, so the next step is to create derived variables and analysis-ready data. Using the core questionnaire as a codebook, derived variables are created below with relevant factors with informative names.
 
 
+```{r}
+#| label: ambarom-read-secret
+#| include: FALSE
+
+library(osfr)
+
+lapop_rds_files <- osf_retrieve_node("https://osf.io/z5c3m/") %>%
+  osf_ls_files(path="LAPOP_2021", n_max=40, pattern=".rds")
+
+filedet <- lapop_rds_files %>%
+  osf_download(conflicts="overwrite", path=here::here("osf_dl"))
+
+ambarom_in <- filedet %>%
+  pull(local_path) %>%
+  read_rds()
+
+unlink(pull(filedet, "local_path"))
+```
+
+
 ```{r}
 #| label: ambarom-derive
 ambarom <- ambarom_in %>%

diff --git a/DataCleaningScripts/LAPOP_2021_DataPrep.Rmd b/DataCleaningScripts/LAPOP_2021_DataPrep.Rmd
@@ -77,13 +77,17 @@ lapop <- lapop_in %>%
 
 summary(lapop)
 
-lapop_temp_loc <- here("osf_dl", "lapop_2021.rds")
+dir.create(here("osf_dl", "LAPOP_2021"))
+
+lapop_temp_loc <- here("osf_dl", "LAPOP_2021", "lapop_2021.rds")
 
 write_rds(lapop, lapop_temp_loc)
 
-target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") 
+# target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") 
+
+target_dir <- osf_retrieve_node("https://osf.io/z5c3m/")
 
-osf_upload(target_dir, path=lapop_temp_loc, conflicts="overwrite")
+osf_upload(target_dir, path=here("osf_dl", "LAPOP_2021"), conflicts="overwrite")
 
 unlink(lapop_temp_loc)
 ```

diff --git a/DataCleaningScripts/LAPOP_2021_DataPrep.md b/DataCleaningScripts/LAPOP_2021_DataPrep.md
@@ -61,69 +61,75 @@ lapop <- lapop_in %>%
 summary(lapop)
 ```
 
-    ##       pais           strata               upm              weight1500       core_a_core_b     
-    ##  Min.   : 1.00   Min.   :1.000e+08   Min.   :1.001e+07   Min.   :0.004136   Length:64352      
-    ##  1st Qu.: 6.00   1st Qu.:6.000e+08   1st Qu.:6.153e+07   1st Qu.:0.251556   Class :character  
-    ##  Median :11.00   Median :1.100e+09   Median :1.202e+08   Median :0.417251   Mode  :character  
-    ##  Mean   :13.03   Mean   :1.303e+09   Mean   :1.666e+08   Mean   :0.512805                     
-    ##  3rd Qu.:17.00   3rd Qu.:1.700e+09   3rd Qu.:2.105e+08   3rd Qu.:0.674477                     
-    ##  Max.   :41.00   Max.   :4.100e+09   Max.   :1.135e+09   Max.   :7.024495                     
-    ##                                                                                               
-    ##        q2              q1tb          covid2at           a4             idio2          idio2cov    
-    ##  Min.   : 16.00   Min.   :1.000   Min.   :1.000   Min.   :  1.00   Min.   :1.000   Min.   :1.000  
-    ##  1st Qu.: 27.00   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:  3.00   1st Qu.:2.000   1st Qu.:1.000  
-    ##  Median : 36.00   Median :2.000   Median :2.000   Median : 22.00   Median :3.000   Median :1.000  
-    ##  Mean   : 38.86   Mean   :1.521   Mean   :2.076   Mean   : 36.73   Mean   :2.439   Mean   :1.242  
-    ##  3rd Qu.: 49.00   3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.: 71.00   3rd Qu.:3.000   3rd Qu.:1.000  
-    ##  Max.   :121.00   Max.   :3.000   Max.   :4.000   Max.   :865.00   Max.   :3.000   Max.   :2.000  
-    ##  NA's   :90       NA's   :90      NA's   :6686    NA's   :4965     NA's   :2766    NA's   :31580  
-    ##       it1             jc13             m1            mil10a          mil10e          ccch1      
-    ##  Min.   :1.000   Min.   :1.00    Min.   :1.00    Min.   :1.00    Min.   :1.00    Min.   :1.00   
-    ##  1st Qu.:2.000   1st Qu.:1.00    1st Qu.:2.00    1st Qu.:2.00    1st Qu.:2.00    1st Qu.:1.00   
-    ##  Median :2.000   Median :2.00    Median :3.00    Median :3.00    Median :2.00    Median :1.00   
-    ##  Mean   :2.275   Mean   :1.62    Mean   :2.98    Mean   :2.72    Mean   :2.39    Mean   :1.78   
-    ##  3rd Qu.:3.000   3rd Qu.:2.00    3rd Qu.:4.00    3rd Qu.:3.00    3rd Qu.:3.00    3rd Qu.:2.00   
-    ##  Max.   :4.000   Max.   :2.00    Max.   :5.00    Max.   :4.00    Max.   :4.00    Max.   :4.00   
-    ##  NA's   :3631    NA's   :50827   NA's   :33238   NA's   :49939   NA's   :44021   NA's   :50535  
-    ##      ccch3           ccus1           ccus3            edr            ocup4a           q14       
-    ##  Min.   :1.00    Min.   :1.00    Min.   :1.00    Min.   :0.000   Min.   :1.000   Min.   :1.0    
-    ##  1st Qu.:1.00    1st Qu.:1.00    1st Qu.:1.00    1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.0    
-    ##  Median :2.00    Median :1.00    Median :2.00    Median :2.000   Median :1.000   Median :2.0    
-    ##  Mean   :1.82    Mean   :1.58    Mean   :1.76    Mean   :2.192   Mean   :2.627   Mean   :1.6    
-    ##  3rd Qu.:2.00    3rd Qu.:2.00    3rd Qu.:2.00    3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:2.0    
-    ##  Max.   :3.00    Max.   :4.00    Max.   :3.00    Max.   :3.000   Max.   :7.000   Max.   :2.0    
-    ##  NA's   :51961   NA's   :50028   NA's   :51226   NA's   :4114    NA's   :29505   NA's   :44130  
-    ##       q11n            q12c            q12bn         covidedu1_1     covidedu1_2     covidedu1_3   
-    ##  Min.   :1.000   Min.   : 1.000   Min.   : 0.000   Min.   :0.00    Min.   :0.00    Min.   :0.00   
-    ##  1st Qu.:1.000   1st Qu.: 3.000   1st Qu.: 0.000   1st Qu.:0.00    1st Qu.:0.00    1st Qu.:0.00   
-    ##  Median :2.000   Median : 4.000   Median : 1.000   Median :0.00    Median :0.00    Median :1.00   
-    ##  Mean   :2.214   Mean   : 4.036   Mean   : 1.001   Mean   :0.17    Mean   :0.07    Mean   :0.62   
-    ##  3rd Qu.:3.000   3rd Qu.: 5.000   3rd Qu.: 2.000   3rd Qu.:0.00    3rd Qu.:0.00    3rd Qu.:1.00   
-    ##  Max.   :7.000   Max.   :20.000   Max.   :16.000   Max.   :1.00    Max.   :1.00    Max.   :1.00   
-    ##  NA's   :31198   NA's   :29144    NA's   :29449    NA's   :51297   NA's   :51297   NA's   :51297  
-    ##   covidedu1_4     covidedu1_5         gi0n            r15             r18n            r18       
-    ##  Min.   :0.00    Min.   :0.00    Min.   :1.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
-    ##  1st Qu.:0.00    1st Qu.:0.00    1st Qu.:1.000   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:1.000  
-    ##  Median :0.00    Median :0.00    Median :1.000   Median :1.000   Median :1.000   Median :1.000  
-    ##  Mean   :0.12    Mean   :0.08    Mean   :1.646   Mean   :0.513   Mean   :0.537   Mean   :0.815  
-    ##  3rd Qu.:0.00    3rd Qu.:0.00    3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
-    ##  Max.   :1.00    Max.   :1.00    Max.   :5.000   Max.   :1.000   Max.   :1.000   Max.   :1.000  
-    ##  NA's   :51297   NA's   :51297   NA's   :1240    NA's   :4118    NA's   :4386    NA's   :4249
+    ##       pais           strata               upm              weight1500       core_a_core_b            q2              q1tb          covid2at    
+    ##  Min.   : 1.00   Min.   :1.000e+08   Min.   :1.001e+07   Min.   :0.004136   Length:64352       Min.   : 16.00   Min.   :1.000   Min.   :1.000  
+    ##  1st Qu.: 6.00   1st Qu.:6.000e+08   1st Qu.:6.153e+07   1st Qu.:0.251556   Class :character   1st Qu.: 27.00   1st Qu.:1.000   1st Qu.:1.000  
+    ##  Median :11.00   Median :1.100e+09   Median :1.202e+08   Median :0.417251   Mode  :character   Median : 36.00   Median :2.000   Median :2.000  
+    ##  Mean   :13.03   Mean   :1.303e+09   Mean   :1.666e+08   Mean   :0.512805                      Mean   : 38.86   Mean   :1.521   Mean   :2.076  
+    ##  3rd Qu.:17.00   3rd Qu.:1.700e+09   3rd Qu.:2.105e+08   3rd Qu.:0.674477                      3rd Qu.: 49.00   3rd Qu.:2.000   3rd Qu.:3.000  
+    ##  Max.   :41.00   Max.   :4.100e+09   Max.   :1.135e+09   Max.   :7.024495                      Max.   :121.00   Max.   :3.000   Max.   :4.000  
+    ##                                                                                                NA's   :90       NA's   :90      NA's   :6686   
+    ##        a4             idio2          idio2cov          it1             jc13             m1            mil10a          mil10e          ccch1      
+    ##  Min.   :  1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.00    Min.   :1.00    Min.   :1.00    Min.   :1.00    Min.   :1.00   
+    ##  1st Qu.:  3.00   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:1.00    1st Qu.:2.00    1st Qu.:2.00    1st Qu.:2.00    1st Qu.:1.00   
+    ##  Median : 22.00   Median :3.000   Median :1.000   Median :2.000   Median :2.00    Median :3.00    Median :3.00    Median :2.00    Median :1.00   
+    ##  Mean   : 36.73   Mean   :2.439   Mean   :1.242   Mean   :2.275   Mean   :1.62    Mean   :2.98    Mean   :2.72    Mean   :2.39    Mean   :1.78   
+    ##  3rd Qu.: 71.00   3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.:3.000   3rd Qu.:2.00    3rd Qu.:4.00    3rd Qu.:3.00    3rd Qu.:3.00    3rd Qu.:2.00   
+    ##  Max.   :865.00   Max.   :3.000   Max.   :2.000   Max.   :4.000   Max.   :2.00    Max.   :5.00    Max.   :4.00    Max.   :4.00    Max.   :4.00   
+    ##  NA's   :4965     NA's   :2766    NA's   :31580   NA's   :3631    NA's   :50827   NA's   :33238   NA's   :49939   NA's   :44021   NA's   :50535  
+    ##      ccch3           ccus1           ccus3            edr            ocup4a           q14             q11n            q12c            q12bn       
+    ##  Min.   :1.00    Min.   :1.00    Min.   :1.00    Min.   :0.000   Min.   :1.000   Min.   :1.0     Min.   :1.000   Min.   : 1.000   Min.   : 0.000  
+    ##  1st Qu.:1.00    1st Qu.:1.00    1st Qu.:1.00    1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.0     1st Qu.:1.000   1st Qu.: 3.000   1st Qu.: 0.000  
+    ##  Median :2.00    Median :1.00    Median :2.00    Median :2.000   Median :1.000   Median :2.0     Median :2.000   Median : 4.000   Median : 1.000  
+    ##  Mean   :1.82    Mean   :1.58    Mean   :1.76    Mean   :2.192   Mean   :2.627   Mean   :1.6     Mean   :2.214   Mean   : 4.036   Mean   : 1.001  
+    ##  3rd Qu.:2.00    3rd Qu.:2.00    3rd Qu.:2.00    3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:2.0     3rd Qu.:3.000   3rd Qu.: 5.000   3rd Qu.: 2.000  
+    ##  Max.   :3.00    Max.   :4.00    Max.   :3.00    Max.   :3.000   Max.   :7.000   Max.   :2.0     Max.   :7.000   Max.   :20.000   Max.   :16.000  
+    ##  NA's   :51961   NA's   :50028   NA's   :51226   NA's   :4114    NA's   :29505   NA's   :44130   NA's   :31198   NA's   :29144    NA's   :29449   
+    ##   covidedu1_1     covidedu1_2     covidedu1_3     covidedu1_4     covidedu1_5         gi0n            r15             r18n            r18       
+    ##  Min.   :0.00    Min.   :0.00    Min.   :0.00    Min.   :0.00    Min.   :0.00    Min.   :1.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
+    ##  1st Qu.:0.00    1st Qu.:0.00    1st Qu.:0.00    1st Qu.:0.00    1st Qu.:0.00    1st Qu.:1.000   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:1.000  
+    ##  Median :0.00    Median :0.00    Median :1.00    Median :0.00    Median :0.00    Median :1.000   Median :1.000   Median :1.000   Median :1.000  
+    ##  Mean   :0.17    Mean   :0.07    Mean   :0.62    Mean   :0.12    Mean   :0.08    Mean   :1.646   Mean   :0.513   Mean   :0.537   Mean   :0.815  
+    ##  3rd Qu.:0.00    3rd Qu.:0.00    3rd Qu.:1.00    3rd Qu.:0.00    3rd Qu.:0.00    3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
+    ##  Max.   :1.00    Max.   :1.00    Max.   :1.00    Max.   :1.00    Max.   :1.00    Max.   :5.000   Max.   :1.000   Max.   :1.000   Max.   :1.000  
+    ##  NA's   :51297   NA's   :51297   NA's   :51297   NA's   :51297   NA's   :51297   NA's   :1240    NA's   :4118    NA's   :4386    NA's   :4249
 
 ``` r
-lapop_temp_loc <- here("osf_dl", "lapop_2021.rds")
+dir.create(here("osf_dl", "LAPOP_2021"))
+```
+
+    ## Warning in dir.create(here("osf_dl", "LAPOP_2021")): 'C:\Users\steph\Documents\GitHub\tidy-survey-book\osf_dl\LAPOP_2021' already exists
+
+``` r
+lapop_temp_loc <- here("osf_dl", "LAPOP_2021", "lapop_2021.rds")
 
 write_rds(lapop, lapop_temp_loc)
 
-target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") 
+# target_dir <- osf_retrieve_node("https://osf.io/gzbkn/?view_only=8ca80573293b4e12b7f934a0f742b957") 
+
+target_dir <- osf_retrieve_node("https://osf.io/z5c3m/")
 
-osf_upload(target_dir, path=lapop_temp_loc, conflicts="overwrite")
+osf_upload(target_dir, path=here("osf_dl", "LAPOP_2021"), conflicts="overwrite")
 ```
 
+    ## Searching for conflicting files on OSF
+
+    ## Retrieving 24 of 24 available items:
+
+    ## ..retrieved 10 items
+
+    ## ..retrieved 20 items
+
+    ## ..retrieved 24 items
+
+    ## ..done
+
+    ## Updating 1 existing file(s) on OSF
+
     ## # A tibble: 1 × 3
-    ##   name           id                       meta            
-    ##   <chr>          <chr>                    <list>          
-    ## 1 lapop_2021.rds 647cddbbbf3d0f09ccd873b8 <named list [3]>
+    ##   name       id                       meta            
+    ##   <chr>      <chr>                    <list>          
+    ## 1 LAPOP_2021 647ce3443c3a380884a04379 <named list [3]>
 
 ``` r
 unlink(lapop_temp_loc)