Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mixed Repeat/Nonrepeat Instrument Support #177

Merged
merged 20 commits into from
Mar 11, 2024
Merged

Conversation

rsh52
Copy link
Collaborator

@rsh52 rsh52 commented Mar 1, 2024

Description

This PR seeks to add a new parameter to read_redcap() which will allow users to override the check in place that stops data exports for REDCap projects where instruments are detected to be both repeating and nonrepeating. The background rationale for not allowing this was to keep to tidy data principles.

For the purposes of this PR and documentation, here are my definitions for some things:

  • Mixed Structure: One of the supertibble columns is called structure, and has only ever been "nonrepeating" / "repeating" for values. I wanted to keep this terminology so some of the updated functions use "structure" / "mixed structure"
  • Mixed Instrument: An instrument that has a mixed structure

Keeping this in draft for now until TODOs are addressed.

Proposed Changes

List changes below in bullet format:

  • Add param enable_repeat_nonrepeat to read_readcap() and lower-level function clean_redcap_long()
  • Move shared logic in check_repeat_and_nonrepeat() to new function get_mixed_structure_fields()
  • Wrap get_mixed_structure_fields() in handler function for converting nonrepeat parts of mixed structure instruments to repeating ones with a single instance via convert_mixed_instrument()
  • Update tests
  • Update test_creds.R (We now have a new database where a new env variable is needed: REDCAPTIDIER_MIXED_STRUCTURE_API)
  • Update NEWS and version (currently considering this a minor update to 1.1.0)
  • Update Diving Deeper article with new section on mixed structure instruments
  • Update structure column in supertibble to say "mixed" for mixed structure instruments

Remaining TODOs:

  • Determine if we should add a new category to structure in the supertibble
    • Currently this is either "repeating" or "nonrepeating" but we could open this further to say "mixed" or "repeating/nonrepeating"
  • Update Diving Deeper vignette/article (holding off until we're happy with logic choices and syntax)

Issue Addressed

Addresses #126
Addresses #169

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

  • New/revised functions have associated tests
  • New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
  • New/revised functions use appropriate naming conventions
  • New/revised functions don't repeat code
  • Code changes are less than 250 lines total
  • Issues linked to the PR using GitHub's list of keywords
  • The appropriate reviewer is assigned to the PR
  • The appropriate developers are assigned to the PR
  • Pre-release package version incremented using usethis::use_version()

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

  • I checked that new files follow naming conventions and are in the right place
  • I checked that documentation is complete, clear, and without typos
  • I added/edited comments to explain "why" not "how"
  • I checked that all new variable and function names follow naming conventions
  • I checked that new tests have been written for key business logic and/or bugs that this PR fixes
  • I checked that new tests address important edge cases

@rsh52 rsh52 added the enhancement New feature or request label Mar 1, 2024
@rsh52 rsh52 added this to the 1.1 milestone Mar 1, 2024
@rsh52 rsh52 self-assigned this Mar 1, 2024
@rsh52 rsh52 requested review from ezraporter and skadauke March 1, 2024 21:15
Richard Hanna added 5 commits March 4, 2024 09:36
@rsh52 rsh52 marked this pull request as ready for review March 4, 2024 19:40
@rsh52
Copy link
Collaborator Author

rsh52 commented Mar 4, 2024

I think this is ready for review now, if I can get @ezraporter to review code (and the rest 😄 ) and @skadauke to review the vignette / glossary updates and the error message:

image

(note that the "Mixed Structure Instruments" hyperlink doesn't work yet until we've published this)

@skadauke
Copy link
Collaborator

skadauke commented Mar 4, 2024

I think this is ready for review now, if I can get @ezraporter to review code (and the rest 😄 ) and @skadauke to review the vignette / glossary updates and the error message:

image

(note that the "Mixed Structure Instruments" hyperlink doesn't work yet until we've published this)

I like the glossary, vignette, and error message updates! 👍

Copy link
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is complicated and I'm having a little trouble wrapping my head around what's going on. I tried to add comments where I was getting confused.

It looks like we're losing data in the output though. The test redcap has 5 responses:

  • Nonrepeat 1
  • Nonrepeat 2
  • Mixed Nonrepeat 1
  • Mixed Repeat 1
  • Mixed Repeat 2

Those first 2 should show up in nonrepeat_form but the redcap_data there has only "Nonrepeat 2":

devtools::load_all()

data <- read_redcap(
  Sys.getenv("REDCAP_URI"),
  Sys.getenv("REDCAPTIDIER_MIXED_STRUCTURE_API"),
  enable_mixed_structure = TRUE
) |>
  extract_tibble("nonrepeat_form")

data$nonrepeat_1
#> [1] "Nonrepeat 2"

The data for mixed_structure_form looks okay.

R/read_redcap.R Outdated Show resolved Hide resolved
R/clean_redcap_long.R Outdated Show resolved Hide resolved
R/read_redcap.R Outdated Show resolved Hide resolved
R/read_redcap.R Outdated Show resolved Hide resolved
R/clean_redcap_long.R Outdated Show resolved Hide resolved
R/clean_redcap_long.R Outdated Show resolved Hide resolved
Richard Hanna added 5 commits March 5, 2024 13:01
TODO Fix structure label assignment in supertibble
Minor cleaning, updating of convert_mixed_instrument function
Fix case_when issue
@rsh52 rsh52 requested a review from ezraporter March 6, 2024 18:45
Copy link
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I've 99% convinced myself this works and I actually think your solution is really good. I say we:

  1. Run our timing benchmarks to make sure the latest change didn't mess things up. (Can we also add identifiers to the microbenchmark_results.csv so it's easy to line things up when we add new redcaps?)
  2. Merge this and ask for the issue opener to test the dev version

R/clean_redcap_long.R Outdated Show resolved Hide resolved
db_data_long,
db_metadata_long,
linked_arms,
has_mixed_structure_forms = has_mixed_structure_forms,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine but noting that one implication of this is that convert_mixed_instrument() gets run for every instrument even if it's actually repeating rather than mixed. It works because because mixed_structure_ref gets filtered to 0 rows for repeating instruments and the for loop in convert_mixed_instrument() doesn't run. We do still need to filter() mixed_structure_ref every time to find that out though which could be a performance hit. No need to optimize until it's a problem though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it? The actual conversion is wrapped in a check on has_mixed_structure_forms so if FALSE (which it would be for the whole map() sequence) then convert_mixed_instrument() wouldn't get run unless I'm missing something.

if (has_mixed_structure_forms) {
db_data_long <- convert_mixed_instrument(db_data_long, mixed_structure_ref %>% filter(form_name == my_form))
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but if you have mixed and repeating instruments then has_mixed_structure_forms is TRUE and it gets run for all the repeating instruments in addition to the mixed instruments.

rsh52 and others added 2 commits March 11, 2024 09:14
Co-authored-by: Ezra Porter <60618324+ezraporter@users.noreply.github.com>
@rsh52
Copy link
Collaborator Author

rsh52 commented Mar 11, 2024

Run our timing benchmarks to make sure the latest change didn't mess things up. (Can we also add identifiers to the microbenchmark_results.csv so it's easy to line things up when we add new redcaps?)

Updated run results and added column outputs for the database description and source (ouhsc / redcaptidier).

@ezraporter
Copy link
Collaborator

Run our timing benchmarks to make sure the latest change didn't mess things up. (Can we also add identifiers to the microbenchmark_results.csv so it's easy to line things up when we add new redcaps?)

Updated run results and added column outputs for the database description and source (ouhsc / redcaptidier).

Thanks!

@rsh52 rsh52 merged commit 5ab694e into main Mar 11, 2024
6 checks passed
@rsh52 rsh52 deleted the repeat-nonrepeat-support branch March 11, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants