You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The results are paginated, so users can rely upon the `per_page` and `start` argument to requested subsequent pages of results. We'll start at 6 and to show that we retrieve the last five results from the previous query plus 15 more (due to `per_page = 20`):
More complicated searches can specify metadata fields like `title` and restrict results to a specific `type` of Dataverse object (a "dataverse", "dataset", or "file"):
33
64
34
-
```{r}
65
+
```{r, eval=FALSE}
35
66
ei <- dataverse_search(author = "Gary King", title = "Ecological Inference", type = "dataset", per_page = 20)
## [10] "07 Letterlocking Categories and Formats Chart"
90
+
## [11] "10 Foldable: Launch Little Book of Locks (UH6089), with Categories and Formats Chart. Letterlocking Instructional Resources"
91
+
## [12] "10 Million International Dyadic Events"
92
+
## [13] "1479 data points of covid19 policy response times"
93
+
## [14] "2016 Census of Population: ADA and DA Maps for Kings County Nova Scotia"
94
+
## [15] "3D Dust map from Green et al. (2015)"
95
+
## [16] "3D dust map from Green et al. (2017)"
96
+
## [17] "3D dust map from Green et al. (2019)"
97
+
## [18] "A 1D Lyman-alpha Profile Camera for Plasma Edge Neutral Studies on the DIII-D Tokamak"
98
+
## [19] "A Comparative Analysis of Brazil's Foreign Policy Drivers Towards the USA: Comment on Amorim Neto (2011)"
99
+
## [20] "A Critique of Dyadic Design"
100
+
## 16 1998 Jewish Community Study of the Coachella Valley, California
101
+
## 17 2002 State Legislative Survey
102
+
## 18 2007 White Sands Dune Field lidar topographic data
103
+
## 19 2008 White Sands Dune Field lidar topographic data
104
+
## 20 2012 STATA Data.tab
105
+
106
+
```
107
+
42
108
Once datasets and files are identified, it is easy to download and use them directly in R. See the ["Data Download" vignette](C-download.html) for details.
Copy file name to clipboardExpand all lines: vignettes/C-download.Rmd
+52-12Lines changed: 52 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -42,28 +42,35 @@ library("tibble") # to see dataframes in tidyverse-form
42
42
43
43
First, we retrieve a plain-text file like this dataset on electricity consumption by [Wakiyama et al. (2014)](https://doi.org/10.7910/DVN/ARKOTI/GN1MRT). Taking the file name and dataset DOI from this entry,
44
44
45
-
```{r, echo=FALSE, message=FALSE,include=FALSE}
45
+
46
+
```{r, eval=FALSE}
46
47
energy <- get_dataframe_by_name(
47
48
filename = "comprehensiveJapanEnergy.tab",
48
49
dataset = "10.7910/DVN/ARKOTI",
49
50
server = "dataverse.harvard.edu")
50
51
```
51
52
52
53
```{r, eval=FALSE}
53
-
energy <- get_dataframe_by_name(
54
-
filename = "comprehensiveJapanEnergy.tab",
55
-
dataset = "10.7910/DVN/ARKOTI",
56
-
server = "dataverse.harvard.edu")
54
+
head(energy)
57
55
```
58
56
59
57
```{r}
60
-
head(energy)
58
+
## # A tibble: 6 × 10
59
+
## time date dummy temp temp2 all large house kepco tepco
These `get_dataframe_*` functions, introduced in v0.3.0, directly read in the data into a R environment through whatever R function supplied by `.f`. The default of the `get_dataframe_*` functions is to read in such data by `readr::read_tsv()`. The `.f` function can be modified to modify the read-in settings. For example, the following modification is a base-R equivalent to read in the ingested data.
64
71
65
72
66
-
```{r}
73
+
```{r, eval=FALSE}
67
74
library(readr)
68
75
energy <- get_dataframe_by_name(
69
76
filename = "comprehensiveJapanEnergy.tab",
@@ -74,6 +81,16 @@ energy <- get_dataframe_by_name(
74
81
head(energy)
75
82
```
76
83
84
+
```{r}
85
+
## time date dummy temp temp2 all large house kepco tepco
The dataverse package can also download datasets that are _drafts_ (i.e. versions not released publicly), as long as the user of the dataset provides their appropriate DATAVERSE_KEY. Users may need to modify the metadata of a datafile, such as adding a descriptive label, for the data downloading to work properly in this case. This is because the the file identifier UNF, which the read function relies on, may only appear after metadata has been added.
79
96
@@ -83,7 +100,7 @@ The dataverse package can also download datasets that are _drafts_ (i.e. version
83
100
84
101
If a file is displayed on dataverse as a `.tab` file like the survey data by [Alvarez et al. (2013)](https://doi.org/10.7910/DVN/ARKOTI/A8YRMP), it is likely that Dataverse [ingested](https://guides.dataverse.org/en/latest/user/tabulardataingest/index.html) the file to a plain-text, tab-delimited format.
However, ingested files may not retain important dataset attributes. For example, Stata and SPSS datasets encode value labels on to numeric values. Factor variables in R dataframes encode levels, not only labels. A plain-text ingested file will discard such information. For example, the `polling_place` variable in this data is only given by numbers, although the original data labelled these numbers with informative values.
95
112
96
-
```{r}
113
+
```{r,eval=FALSE}
97
114
str(argentina_tab$polling_place)
98
115
```
99
116
117
+
```{r}
118
+
## num [1:1475] 31 31 31 31 31 31 31 31 31 31 ...
119
+
```
120
+
100
121
When ingesting, Dataverse retains a `original` version that retains these attributes but may not be readable in some platforms. The `get_dataframe_*` functions have an argument that can be set to `original = TRUE`. In this case we know that `alpl2013.tab` was originally a Stata dta file, so we can run:
## @ labels : Named num [1:37] 1 2 3 4 5 6 7 8 9 10 ...
143
+
## ..- attr(*, "names")= chr [1:37] "E.E.T." "Escuela Juan Bautista Alberdi" "Escuela Juan Carlos Dávalos" "Escuela Bernardino de Rivadavia" ...
144
+
```
117
145
118
146
119
147
Users should pick `.f` and `original` based on their existing knowledge of the file. If the original file is a `.sav` SPSS file, `.f` can be `haven::read_sav`. If it is a `.Rds` file, use `readRDS` or `readr::read_rds`. In fact, because the raw data is read in as a binary, there is no limitation to the file types `get_dataframe_*` can read in, as far as the dataverse package is concerned.
@@ -138,11 +166,23 @@ This shows that there are indeed 32 files, a mix of .R code files and tab- and c
138
166
139
167
You can also retrieve more extensive metadata using `dataset_metadata()`:
140
168
141
-
```{r}
169
+
```{r, eval=FALSE}
142
170
str(dataset_metadata("10.7910/DVN/ARKOTI", server = "dataverse.harvard.edu"),
If the file you want to retrieve is not data, you may want to use the more primitive function, `get_file`, which gets the file data as a raw binary file. See the help page examples of `get_file()` that use the `base::writeBin()` function for details on how to write and read these binary files instead.
0 commit comments