Add `haven` option to `raw_or_label` #180

ezraporter · 2024-03-14T19:25:31Z

Description

This PR modifies read_redcap() to let users specify raw_or_label = "haven" and have categorical fields converted to haven_labelled vectors instead of factors. I kept labelled in Suggests and added a check that the user has it installed if they specify raw_or_label = "haven".

Implementation considerations

haven_labelled vectors, unlike factors, preserve the underlying data values/types. The non-trivial part about this is that to apply the labels, the data types in the data must match the data types of the vector of values we read from the metadata. This is slightly tricky because there's no foolproof way to say "cast vector x to type of vector y" for arbitrary y. My approach was to basically do our best at casting using readr's parsing functions and fall back to converting the underlying values to chr if all else fails.

See below for a concrete example if it's helpful.

Proposed Changes

Add "haven" option to read_redcap() raw_or_label and update appropriate arg checks
Refactor multi_choice_to_labels()
- Now takes a raw_or_label argument from read_redcap()
- Passes off logic for how categorical fields are handled to a label_handler function which will be one of apply_labs_factor() or apply_labs_haven()

Worked Example

Suppose we have a redcap field coded like this:

value	label
3	apple
5	orange
9	banana

The db_data we get from redcap will have a field like this:

 my_field
    <int>
        5
        3
        5
        9

where the int datatype was actually determined by readr since we don't allow users to pass data type specifications to redcap_read_oneshot()

The db_metadata will contain

 field_name   select_choices_or_calculations         
 <chr>        <chr>                                  
 my_field     3, apple | 5, orange | 9, banana

where the values are stored as chr rather than int.

My implementation:

Checks the data type of db_data$my_field and finds it's int
Casts the labels read from db_metadata, c(apple = "3", orange = "5", banana = "9"), to int with readr::parse_integer()
Applies the labels with labelled::set_value_labels()

If step 2 had failed for some reason then it would cast db_data$my_field to chr and apply the labels to that.

Issue Addressed

Related to #178

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

New/revised functions have associated tests
New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
New/revised functions use appropriate naming conventions
New/revised functions don't repeat code
Code changes are less than 250 lines total
Issues linked to the PR using GitHub's list of keywords
The appropriate reviewer is assigned to the PR
The appropriate developers are assigned to the PR
Pre-release package version incremented using usethis::use_version()

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

I checked that new files follow naming conventions and are in the right place
I checked that documentation is complete, clear, and without typos
I added/edited comments to explain "why" not "how"
I checked that all new variable and function names follow naming conventions
I checked that new tests have been written for key business logic and/or bugs that this PR fixes
I checked that new tests address important edge cases

rsh52

Great work! Some minor comments, but I think the output looks like what's needed and there were some clever methods used here.

I would next see if we can get the issuer to test this and get their feedback on if it meets expectations.

R/utils.R

rsh52 · 2024-03-15T16:10:29Z

R/utils.R

+  } else if (is.numeric(ptype)) {
+    out <- parse_double(x)
+  } else if (is.Date(ptype)) {
+    out <- parse_date(x)


I think we can address these Codecov warnings by adding the associated data types to tests in your new tests. I don't think it's useful to write a test for this function in particular since it's so basic, but you could do either.

REDCapTidieR/tests/testthat/test-utils.R

Line 436 in c58e783

test_that("apply_labs_haven works", {

Good call. Will do! Btw are those code coverage warnings getting added to the files changed screen new? I've never seen them before

Unsure, I started noticing them in the past month or two. Relatively helpful for helping keep our % up

Okay interesting finding (and I think why I skipped including those tests to start).

With labelled if you try to apply labels to vector that has no corresponding val_labels<- method it just returns the unlabelled vector:

labels <- as.Date(c("The first day of 2023" = "2023-01-01")) x <- labelled::set_value_labels(as.Date("2023-01-01"), .labels = labels) x #> [1] "2023-01-01: class(x) #> [1] "Date"

That means if someone has some crazy redcap where the value options are dates and they do raw_or_label = "haven", their value labels won't get applied at all. BUT if they went ahead and implemented a val_labels<-.Date method then they would get applied. This is such a weird situation I don't think we should even care tbh. I added tests for force_cast() just to get rid of the coverage warnings.

Huh, very interesting find. In that case yea, much better to just test force_cast(). If for some reason someone encounters this then an issue can be opened.

I figured we wouldn't test force_cast() since most of the options for it were already tested in the parent function but whatever appeases the CodeCov gods is fine.

Value labels could be applied only to character or numeric vectors. It is not possible to add value labels to date vectors.

In fact, an error should have occured. I have open an issue on labelled website: larmarange/labelled#156

R/utils.R

add haven option to labels

d6c90f0

ezraporter added the enhancement New feature or request label Mar 14, 2024

ezraporter requested review from skadauke and rsh52 March 14, 2024 19:25

ezraporter self-assigned this Mar 14, 2024

lint

c58e783

rsh52 approved these changes Mar 15, 2024

View reviewed changes

review feedback

117adc2

ezraporter merged commit 04bbe5b into main Mar 15, 2024
6 checks passed

ezraporter deleted the haven-labelled branch March 15, 2024 18:01

ezraporter mentioned this pull request Mar 15, 2024

[FEATURE] an option raw_or_label = "haven_labelled" for read_redcap() #178

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `haven` option to `raw_or_label` #180

Add `haven` option to `raw_or_label` #180

ezraporter commented Mar 14, 2024 •

edited by rsh52

Loading

rsh52 left a comment

rsh52 Mar 15, 2024

ezraporter Mar 15, 2024 •

edited

Loading

rsh52 Mar 15, 2024

ezraporter Mar 15, 2024

rsh52 Mar 15, 2024

larmarange Mar 19, 2024

Add haven option to raw_or_label #180

Add haven option to raw_or_label #180

Conversation

ezraporter commented Mar 14, 2024 • edited by rsh52 Loading

Description

Implementation considerations

Proposed Changes

Worked Example

Issue Addressed

PR Checklist

Code Review

Code Review Checklist

rsh52 left a comment

Choose a reason for hiding this comment

rsh52 Mar 15, 2024

Choose a reason for hiding this comment

ezraporter Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

rsh52 Mar 15, 2024

Choose a reason for hiding this comment

ezraporter Mar 15, 2024

Choose a reason for hiding this comment

rsh52 Mar 15, 2024

Choose a reason for hiding this comment

larmarange Mar 19, 2024

Choose a reason for hiding this comment

Add `haven` option to `raw_or_label` #180

Add `haven` option to `raw_or_label` #180

ezraporter commented Mar 14, 2024 •

edited by rsh52

Loading

ezraporter Mar 15, 2024 •

edited

Loading