Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add haven option to raw_or_label #180

Merged
merged 3 commits into from
Mar 15, 2024
Merged

Add haven option to raw_or_label #180

merged 3 commits into from
Mar 15, 2024

Conversation

ezraporter
Copy link
Collaborator

@ezraporter ezraporter commented Mar 14, 2024

Description

This PR modifies read_redcap() to let users specify raw_or_label = "haven" and have categorical fields converted to haven_labelled vectors instead of factors. I kept labelled in Suggests and added a check that the user has it installed if they specify raw_or_label = "haven".

Implementation considerations

haven_labelled vectors, unlike factors, preserve the underlying data values/types. The non-trivial part about this is that to apply the labels, the data types in the data must match the data types of the vector of values we read from the metadata. This is slightly tricky because there's no foolproof way to say "cast vector x to type of vector y" for arbitrary y. My approach was to basically do our best at casting using readr's parsing functions and fall back to converting the underlying values to chr if all else fails.

See below for a concrete example if it's helpful.

Proposed Changes

  • Add "haven" option to read_redcap() raw_or_label and update appropriate arg checks
  • Refactor multi_choice_to_labels()
    • Now takes a raw_or_label argument from read_redcap()
    • Passes off logic for how categorical fields are handled to a label_handler function which will be one of apply_labs_factor() or apply_labs_haven()

Worked Example

Suppose we have a redcap field coded like this:

value label
3 apple
5 orange
9 banana

The db_data we get from redcap will have a field like this:

 my_field
    <int>
        5
        3
        5
        9

where the int datatype was actually determined by readr since we don't allow users to pass data type specifications to redcap_read_oneshot()

The db_metadata will contain

 field_name   select_choices_or_calculations         
 <chr>        <chr>                                  
 my_field     3, apple | 5, orange | 9, banana

where the values are stored as chr rather than int.

My implementation:

  1. Checks the data type of db_data$my_field and finds it's int
  2. Casts the labels read from db_metadata, c(apple = "3", orange = "5", banana = "9"), to int with readr::parse_integer()
  3. Applies the labels with labelled::set_value_labels()

If step 2 had failed for some reason then it would cast db_data$my_field to chr and apply the labels to that.

Issue Addressed

Related to #178

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

  • New/revised functions have associated tests
  • New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
  • New/revised functions use appropriate naming conventions
  • New/revised functions don't repeat code
  • Code changes are less than 250 lines total
  • Issues linked to the PR using GitHub's list of keywords
  • The appropriate reviewer is assigned to the PR
  • The appropriate developers are assigned to the PR
  • Pre-release package version incremented using usethis::use_version()

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

  • I checked that new files follow naming conventions and are in the right place
  • I checked that documentation is complete, clear, and without typos
  • I added/edited comments to explain "why" not "how"
  • I checked that all new variable and function names follow naming conventions
  • I checked that new tests have been written for key business logic and/or bugs that this PR fixes
  • I checked that new tests address important edge cases

@ezraporter ezraporter added the enhancement New feature or request label Mar 14, 2024
@ezraporter ezraporter requested review from skadauke and rsh52 March 14, 2024 19:25
@ezraporter ezraporter self-assigned this Mar 14, 2024
Copy link
Collaborator

@rsh52 rsh52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some minor comments, but I think the output looks like what's needed and there were some clever methods used here.

I would next see if we can get the issuer to test this and get their feedback on if it meets expectations.

R/utils.R Show resolved Hide resolved
} else if (is.numeric(ptype)) {
out <- parse_double(x)
} else if (is.Date(ptype)) {
out <- parse_date(x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can address these Codecov warnings by adding the associated data types to tests in your new tests. I don't think it's useful to write a test for this function in particular since it's so basic, but you could do either.

test_that("apply_labs_haven works", {

Copy link
Collaborator Author

@ezraporter ezraporter Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Will do! Btw are those code coverage warnings getting added to the files changed screen new? I've never seen them before

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure, I started noticing them in the past month or two. Relatively helpful for helping keep our % up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay interesting finding (and I think why I skipped including those tests to start).

With labelled if you try to apply labels to vector that has no corresponding val_labels<- method it just returns the unlabelled vector:

labels <- as.Date(c("The first day of 2023" = "2023-01-01"))

x <- labelled::set_value_labels(as.Date("2023-01-01"), .labels = labels)

x
#> [1] "2023-01-01:

class(x)
#> [1] "Date"

That means if someone has some crazy redcap where the value options are dates and they do raw_or_label = "haven", their value labels won't get applied at all. BUT if they went ahead and implemented a val_labels<-.Date method then they would get applied. This is such a weird situation I don't think we should even care tbh. I added tests for force_cast() just to get rid of the coverage warnings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, very interesting find. In that case yea, much better to just test force_cast(). If for some reason someone encounters this then an issue can be opened.

I figured we wouldn't test force_cast() since most of the options for it were already tested in the parent function but whatever appeases the CodeCov gods is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value labels could be applied only to character or numeric vectors. It is not possible to add value labels to date vectors.

In fact, an error should have occured. I have open an issue on labelled website: larmarange/labelled#156

R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Show resolved Hide resolved
@ezraporter ezraporter merged commit 04bbe5b into main Mar 15, 2024
6 checks passed
@ezraporter ezraporter deleted the haven-labelled branch March 15, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants