From 538fd153f10b1a3b7045197129295ebd9e3e3556 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Mon, 18 Jan 2021 10:36:31 -0500 Subject: [PATCH 1/7] Fix typo --- R/get_dataframe.R | 2 +- man/get_dataframe.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/R/get_dataframe.R b/R/get_dataframe.R index 09c9101..a505abc 100644 --- a/R/get_dataframe.R +++ b/R/get_dataframe.R @@ -9,7 +9,7 @@ #' `"roster-bulls-1996.tab"`. #' @param .f The function to used for reading in the raw dataset. This user #' must choose the appropriate function: for example if the target is a .rds -#' file, then `.f` should be `readRDS` or `readr::read_`rds`. +#' file, then `.f` should be `readRDS` or `readr::read_rds`. #' @param original A logical, defaulting to TRUE. Whether to read the ingested, #' archival version of the dataset if one exists. The archival versions are tab-delimited #' `.tab` files so if `original = FALSE`, `.f` is set to `readr::read_tsv`. diff --git a/man/get_dataframe.Rd b/man/get_dataframe.Rd index 6f4f4c1..90a2a66 100644 --- a/man/get_dataframe.Rd +++ b/man/get_dataframe.Rd @@ -28,7 +28,7 @@ for example \code{"doi:10.70122/FK2/HXJVJU"}. Alternatively, an object of class \item{.f}{The function to used for reading in the raw dataset. This user must choose the appropriate function: for example if the target is a .rds -file, then \code{.f} should be \code{readRDS} or \code{readr::read_}rds`.} +file, then \code{.f} should be \code{readRDS} or \code{readr::read_rds}.} \item{original}{A logical, defaulting to TRUE. Whether to read the ingested, archival version of the dataset if one exists. The archival versions are tab-delimited From 50afad30e49bc103962d07045376f85958b59d84 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Mon, 18 Jan 2021 10:42:04 -0500 Subject: [PATCH 2/7] Move around documentation for get_* --- R/get_dataframe.R | 10 ++++++---- R/get_file.R | 19 ++++++++++--------- man/files.Rd | 19 ++++++++++--------- man/get_dataframe.Rd | 10 ++++++---- 4 files changed, 32 insertions(+), 26 deletions(-) diff --git a/R/get_dataframe.R b/R/get_dataframe.R index a505abc..ed16a66 100644 --- a/R/get_dataframe.R +++ b/R/get_dataframe.R @@ -1,7 +1,9 @@ -#' Get file from dataverse and convert it into a dataframe or tibble +#' Download dataverse file as a dataframe #' -#' `get_dataframe_by_id`, if you know the numeric ID of the dataset, or instead -#' `get_dataframe_by_name` if you know the filename and doi. The dataset +#' Use `get_dataframe_by_name` if you know the name of the datafile and the DOI +#' of the dataset. Use `get_dataframe_by_doi` if you know the DOI of the datafile +#' itself. Use `get_dataframe_by_id` if you know the numeric ID of the +#' datafile. #' #' @rdname get_dataframe #' @@ -11,7 +13,7 @@ #' must choose the appropriate function: for example if the target is a .rds #' file, then `.f` should be `readRDS` or `readr::read_rds`. #' @param original A logical, defaulting to TRUE. Whether to read the ingested, -#' archival version of the dataset if one exists. The archival versions are tab-delimited +#' archival version of the datafile if one exists. The archival versions are tab-delimited #' `.tab` files so if `original = FALSE`, `.f` is set to `readr::read_tsv`. #' If functions to read the original version is available, then `original = TRUE` #' with a specified `.f` is better. diff --git a/R/get_file.R b/R/get_file.R index 7964f42..9754af9 100644 --- a/R/get_file.R +++ b/R/get_file.R @@ -1,20 +1,21 @@ #' @rdname files #' -#' @title Download File +#' @title Download dataverse file as a raw binary #' -#' @description Download Dataverse File(s). `get_file` is a general wrapper, -#' and can take either dataverse objects, file IDs, or a filename and dataverse. +#' @description Download Dataverse File(s). `get_file_*` +#' functions return a raw binary file, which cannot be readily analyzed in R. +#' To use the objects as dataframes, see the `get_dataset_*` functions at +#' \link{get_dataset} instead. +#' +#' @details This function provides access to data files from a Dataverse entry. +#' `get_file` is a general wrapper, +#' and can take either dataverse objects, file IDs, or a filename and dataverse. +#' Internally, all functions download each file by `get_file_by_id`. #' `get_file_by_name` is a shorthand for running `get_file` by #' specifying a file name (`filename`) and dataset (`dataset`). #' `get_file_by_doi` obtains a file by its file DOI, bypassing the #' `dataset` argument. #' -#' Internally, all functions download each file by `get_file_by_id`. `get_file_*` -#' functions return a raw binary file, which cannot be readily analyzed in R. -#' To use the objects as dataframes, see the `get_dataset_*` functions at \link{get_dataset} -#' -#' @details This function provides access to data files from a Dataverse entry. -#' #' @param file An integer specifying a file identifier; or a vector of integers #' specifying file identifiers; or, if used with the prefix \code{"doi:"}, a #' character with the file-specific DOI; or, if used without the prefix, a diff --git a/man/files.Rd b/man/files.Rd index 7137ad5..6eb2c6a 100644 --- a/man/files.Rd +++ b/man/files.Rd @@ -5,7 +5,7 @@ \alias{get_file_by_name} \alias{get_file_by_id} \alias{get_file_by_doi} -\title{Download File} +\title{Download dataverse file as a raw binary} \usage{ get_file( file, @@ -107,19 +107,20 @@ function. To load datasets into the R environment dataframe, see \link{get_dataframe_by_name}. } \description{ -Download Dataverse File(s). \code{get_file} is a general wrapper, +Download Dataverse File(s). \verb{get_file_*} +functions return a raw binary file, which cannot be readily analyzed in R. +To use the objects as dataframes, see the \verb{get_dataset_*} functions at +\link{get_dataset} instead. +} +\details{ +This function provides access to data files from a Dataverse entry. +\code{get_file} is a general wrapper, and can take either dataverse objects, file IDs, or a filename and dataverse. +Internally, all functions download each file by \code{get_file_by_id}. \code{get_file_by_name} is a shorthand for running \code{get_file} by specifying a file name (\code{filename}) and dataset (\code{dataset}). \code{get_file_by_doi} obtains a file by its file DOI, bypassing the \code{dataset} argument. - -Internally, all functions download each file by \code{get_file_by_id}. \verb{get_file_*} -functions return a raw binary file, which cannot be readily analyzed in R. -To use the objects as dataframes, see the \verb{get_dataset_*} functions at \link{get_dataset} -} -\details{ -This function provides access to data files from a Dataverse entry. } \examples{ \dontrun{ diff --git a/man/get_dataframe.Rd b/man/get_dataframe.Rd index 90a2a66..d8a25fa 100644 --- a/man/get_dataframe.Rd +++ b/man/get_dataframe.Rd @@ -4,7 +4,7 @@ \alias{get_dataframe_by_name} \alias{get_dataframe_by_id} \alias{get_dataframe_by_doi} -\title{Get file from dataverse and convert it into a dataframe or tibble} +\title{Download dataverse file as a dataframe} \usage{ get_dataframe_by_name( filename, @@ -31,7 +31,7 @@ must choose the appropriate function: for example if the target is a .rds file, then \code{.f} should be \code{readRDS} or \code{readr::read_rds}.} \item{original}{A logical, defaulting to TRUE. Whether to read the ingested, -archival version of the dataset if one exists. The archival versions are tab-delimited +archival version of the datafile if one exists. The archival versions are tab-delimited \code{.tab} files so if \code{original = FALSE}, \code{.f} is set to \code{readr::read_tsv}. If functions to read the original version is available, then \code{original = TRUE} with a specified \code{.f} is better.} @@ -68,8 +68,10 @@ or globally using \code{Sys.setenv("DATAVERSE_SERVER" = "dataverse.example.com") \code{"10.70122/FK2/PPIAXE/MHDB0O"} or \code{"doi:10.70122/FK2/PPIAXE/MHDB0O"}} } \description{ -\code{get_dataframe_by_id}, if you know the numeric ID of the dataset, or instead -\code{get_dataframe_by_name} if you know the filename and doi. The dataset +Use \code{get_dataframe_by_name} if you know the name of the datafile and the DOI +of the dataset. Use \code{get_dataframe_by_doi} if you know the DOI of the datafile +itself. Use \code{get_dataframe_by_id} if you know the numeric ID of the +datafile. } \examples{ From cb798511f38c0c26e37f826b899b89f43bba9e55 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Mon, 18 Jan 2021 11:18:58 -0500 Subject: [PATCH 3/7] Reorder. The first example is not about rds, so we shouldn't name it that (cc @wibeasley) --- R/get_dataframe.R | 30 ++++++++++++++++-------------- man/get_dataframe.Rd | 30 ++++++++++++++++-------------- 2 files changed, 32 insertions(+), 28 deletions(-) diff --git a/R/get_dataframe.R b/R/get_dataframe.R index ed16a66..abdaeb3 100644 --- a/R/get_dataframe.R +++ b/R/get_dataframe.R @@ -23,35 +23,27 @@ #' @examples #' #' # Retrieve data.frame from dataverse DOI and file name -#' df_from_rds_ingested <- +#' df_tab <- #' get_dataframe_by_name( #' filename = "roster-bulls-1996.tab", #' dataset = "doi:10.70122/FK2/HXJVJU", #' server = "demo.dataverse.org" #' ) #' -#' # Retrieve the same data.frame from dataverse + file DOI -#' df_from_rds_ingested_by_doi <- +#' # Retrieve the same file from file DOI +#' df_tab <- #' get_dataframe_by_doi( #' filedoi = "10.70122/FK2/HXJVJU/SA3Z2V", #' server = "demo.dataverse.org" #' ) #' -#' # Retrieve ingested file originally a Stata dta -#' df_from_stata_ingested <- -#' get_dataframe_by_name( -#' filename = "nlsw88.tab", -#' dataset = "doi:10.70122/FK2/PPIAXE", -#' server = "demo.dataverse.org" -#' ) -#' #' #' # To use the original file version, or for non-ingested data, #' # please specify `original = TRUE` and specify a function in .f. #' -#' # A data.frame is still returned, but the +#' # Rds files are not ingested so original = TRUE and .f is required. #' if (requireNamespace("readr", quietly = TRUE)) { -#' df_from_rds_original <- +#' df_rds_original <- #' get_dataframe_by_name( #' filename = "nlsw88_rds-export.rds", #' dataset = "doi:10.70122/FK2/PPIAXE", @@ -61,8 +53,9 @@ #' ) #' } #' +#' # Get Stata file as original #' if (requireNamespace("haven", quietly = TRUE)) { -#' df_from_stata_original <- +#' df_stata_original <- #' get_dataframe_by_name( #' filename = "nlsw88.tab", #' dataset = "doi:10.70122/FK2/PPIAXE", @@ -71,6 +64,15 @@ #' .f = haven::read_dta #' ) #' } +#' +#' # Stata file as ingested file (less information than original) +#' df_stata_ingested <- +#' get_dataframe_by_name( +#' filename = "nlsw88.tab", +#' dataset = "doi:10.70122/FK2/PPIAXE", +#' server = "demo.dataverse.org" +#' ) +#' #' @export get_dataframe_by_name <- function ( filename, diff --git a/man/get_dataframe.Rd b/man/get_dataframe.Rd index d8a25fa..ac514ff 100644 --- a/man/get_dataframe.Rd +++ b/man/get_dataframe.Rd @@ -76,35 +76,27 @@ datafile. \examples{ # Retrieve data.frame from dataverse DOI and file name -df_from_rds_ingested <- +df_tab <- get_dataframe_by_name( filename = "roster-bulls-1996.tab", dataset = "doi:10.70122/FK2/HXJVJU", server = "demo.dataverse.org" ) -# Retrieve the same data.frame from dataverse + file DOI -df_from_rds_ingested_by_doi <- +# Retrieve the same file from file DOI +df_tab <- get_dataframe_by_doi( filedoi = "10.70122/FK2/HXJVJU/SA3Z2V", server = "demo.dataverse.org" ) -# Retrieve ingested file originally a Stata dta -df_from_stata_ingested <- - get_dataframe_by_name( - filename = "nlsw88.tab", - dataset = "doi:10.70122/FK2/PPIAXE", - server = "demo.dataverse.org" - ) - # To use the original file version, or for non-ingested data, # please specify `original = TRUE` and specify a function in .f. -# A data.frame is still returned, but the +# Rds files are not ingested so original = TRUE and .f is required. if (requireNamespace("readr", quietly = TRUE)) { - df_from_rds_original <- + df_rds_original <- get_dataframe_by_name( filename = "nlsw88_rds-export.rds", dataset = "doi:10.70122/FK2/PPIAXE", @@ -114,8 +106,9 @@ if (requireNamespace("readr", quietly = TRUE)) { ) } +# Get Stata file as original if (requireNamespace("haven", quietly = TRUE)) { - df_from_stata_original <- + df_stata_original <- get_dataframe_by_name( filename = "nlsw88.tab", dataset = "doi:10.70122/FK2/PPIAXE", @@ -124,4 +117,13 @@ if (requireNamespace("haven", quietly = TRUE)) { .f = haven::read_dta ) } + +# Stata file as ingested file (less information than original) +df_stata_ingested <- + get_dataframe_by_name( + filename = "nlsw88.tab", + dataset = "doi:10.70122/FK2/PPIAXE", + server = "demo.dataverse.org" + ) + } From 342a494109c732ee783e78c66b6deb56d040d80e Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Mon, 18 Jan 2021 11:20:07 -0500 Subject: [PATCH 4/7] readr is already in Imports (and we use readr::read_tsv all the time) so I don't think we need this --- R/get_dataframe.R | 18 ++++++++---------- man/get_dataframe.Rd | 18 ++++++++---------- 2 files changed, 16 insertions(+), 20 deletions(-) diff --git a/R/get_dataframe.R b/R/get_dataframe.R index abdaeb3..afc5542 100644 --- a/R/get_dataframe.R +++ b/R/get_dataframe.R @@ -42,16 +42,14 @@ #' # please specify `original = TRUE` and specify a function in .f. #' #' # Rds files are not ingested so original = TRUE and .f is required. -#' if (requireNamespace("readr", quietly = TRUE)) { -#' df_rds_original <- -#' get_dataframe_by_name( -#' filename = "nlsw88_rds-export.rds", -#' dataset = "doi:10.70122/FK2/PPIAXE", -#' server = "demo.dataverse.org", -#' original = TRUE, -#' .f = readr::read_rds -#' ) -#' } +#' df_rds_original <- +#' get_dataframe_by_name( +#' filename = "nlsw88_rds-export.rds", +#' dataset = "doi:10.70122/FK2/PPIAXE", +#' server = "demo.dataverse.org", +#' original = TRUE, +#' .f = readr::read_rds +#' ) #' #' # Get Stata file as original #' if (requireNamespace("haven", quietly = TRUE)) { diff --git a/man/get_dataframe.Rd b/man/get_dataframe.Rd index ac514ff..dfd8c94 100644 --- a/man/get_dataframe.Rd +++ b/man/get_dataframe.Rd @@ -95,16 +95,14 @@ df_tab <- # please specify `original = TRUE` and specify a function in .f. # Rds files are not ingested so original = TRUE and .f is required. -if (requireNamespace("readr", quietly = TRUE)) { - df_rds_original <- - get_dataframe_by_name( - filename = "nlsw88_rds-export.rds", - dataset = "doi:10.70122/FK2/PPIAXE", - server = "demo.dataverse.org", - original = TRUE, - .f = readr::read_rds - ) -} + df_rds_original <- + get_dataframe_by_name( + filename = "nlsw88_rds-export.rds", + dataset = "doi:10.70122/FK2/PPIAXE", + server = "demo.dataverse.org", + original = TRUE, + .f = readr::read_rds + ) # Get Stata file as original if (requireNamespace("haven", quietly = TRUE)) { From ef8f4544e7c7c6a84f242fcdf2d2ae3365a86766 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Mon, 18 Jan 2021 11:22:25 -0500 Subject: [PATCH 5/7] Reorder authors roughly by amount of edits? (cc: @wibeasley) --- DESCRIPTION | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 993750e..e2c941e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -14,18 +14,18 @@ Authors@R: c( email = "thosjleeper@gmail.com", comment = c(ORCID = "0000-0003-4097-6326") ), - person( - "Philip", "Durbin", - role = c("aut"), - email = "philipdurbin@gmail.com", - comment = c(ORCID = "0000-0002-9528-9470") - ), person( "Shiro", "Kuriwaki", role = c("aut"), email = "shirokuriwaki@gmail.com", comment = c(ORCID = "0000-0002-5687-2647") ), + person( + "Philip", "Durbin", + role = c("aut"), + email = "philipdurbin@gmail.com", + comment = c(ORCID = "0000-0002-9528-9470") + ), person( "Sebastian", "Karcher", role=c("aut"), From c9025d7707bffd5d8dbfeaa0dd83291f4928fee8 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Mon, 18 Jan 2021 11:28:20 -0500 Subject: [PATCH 6/7] As you know dvn is removed from CRAN (although that link still is live and leads to archival version). Also dvn should be for 3 and below --- DESCRIPTION | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index e2c941e..076952e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -54,8 +54,8 @@ Suggests: UNF, yaml Description: Provides access to Dataverse version 4 APIs , - enabling data search, retrieval, and deposit. For Dataverse versions <= 4.0, - use the deprecated 'dvn' package . + enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0, + use the archived 'dvn' package . License: GPL-2 LazyData: true URL: https://github.com/iqss/dataverse-client-r From 1c5d6e8901d2139d9bf0e9293a5c3dea0bc1fff8 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Mon, 18 Jan 2021 11:31:55 -0500 Subject: [PATCH 7/7] Dataverse moved to v5 in late 2020. All our new tests are tested against 5 so we can claim to cover that? (cc @wibeasley) --- DESCRIPTION | 2 +- README.Rmd | 6 +++--- README.md | 25 ++++++++++++------------- 3 files changed, 16 insertions(+), 17 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 076952e..34e9178 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -53,7 +53,7 @@ Suggests: testthat, UNF, yaml -Description: Provides access to Dataverse version 4 APIs , +Description: Provides access to Dataverse APIs (versions 4-5), enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0, use the archived 'dvn' package . License: GPL-2 diff --git a/README.Rmd b/README.Rmd index 5585215..847ea41 100644 --- a/README.Rmd +++ b/README.Rmd @@ -1,5 +1,5 @@ --- -title: "R Client for Dataverse 4 Repositories" +title: "R Client for Dataverse Repositories" output: github_document --- @@ -13,7 +13,7 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") [![Dataverse Project logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org) -The **dataverse** package provides access to [Dataverse 4](https://dataverse.org/) APIs, enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting. +The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4-5), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting. ### Getting Started @@ -32,7 +32,7 @@ library("dataverse") #### Keys -Some features of the Dataverse 4 API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using: +Some features of the Dataverse API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using: ``` r Sys.setenv("DATAVERSE_KEY" = "examplekey12345") diff --git a/README.md b/README.md index 2b75961..bef59be 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -R Client for Dataverse 4 Repositories +R Client for Dataverse Repositories ================ [![CRAN @@ -12,10 +12,10 @@ Status](https://travis-ci.org/IQSS/dataverse-client-r.png?branch=master)](https: logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org) The **dataverse** package provides access to -[Dataverse 4](https://dataverse.org/) APIs, enabling data search, -retrieval, and deposit, thus allowing R users to integrate public data -sharing into the reproducible research workflow. **dataverse** is the -next-generation iteration of [the **dvn** +[Dataverse](https://dataverse.org/) APIs (versions 4-5), enabling data +search, retrieval, and deposit, thus allowing R users to integrate +public data sharing into the reproducible research workflow. +**dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 (“Dataverse Network”) applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including @@ -34,7 +34,7 @@ library("dataverse") #### Keys -Some features of the Dataverse 4 API are public and require no +Some features of the Dataverse API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the @@ -257,13 +257,12 @@ subsequent pages, specify `start`. ### Data Archiving -Dataverse provides two - basically unrelated - workflows for managing -(adding, documenting, and publishing) datasets. The first is built on -[SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a -new dataset listing, you will have to first initialize a dataset entry with -some metadata, add one or more files to the dataset, and then publish -it. This looks something like the following: - +Dataverse provides two - basically unrelated - workflows for managing +(adding, documenting, and publishing) datasets. The first is built on +[SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a +new dataset listing, you will have to first initialize a dataset entry +with some metadata, add one or more files to the dataset, and then +publish it. This looks something like the following: ``` r # retrieve your service document