Skip to content

Commit

Permalink
Tidy up draft, add to Design section
Browse files Browse the repository at this point in the history
  • Loading branch information
rsh52 committed Nov 8, 2023
1 parent 48e1b42 commit 52156ad
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 8 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
^cran-comments\.md$
README.html$

paper/
pkgdown/
utility/
lastMiKTeXException/
Expand Down
41 changes: 35 additions & 6 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,39 @@ @article{Wickham2014
}

@Manual{r_citation,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2020},
url = {https://www.R-project.org/},
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2020},
url = {https://www.R-project.org/},
}

@Manual{redcapr_cit,
title = {REDCapR: Interaction Between R and REDCap},
author = {{Beasley, Will}},
organization = {Biomedical and Behavioral Methodology Core (University of Oklahoma Health Sciences Center)},
address = {Oklahoma City, Oklahoma},
year = {2022},
url = {https://cran.r-project.org/web/packages/REDCapR/index.html},
}

@Manual{redcapapi_cit,
title = {redcapAPI: Interface to 'REDCap'},
author = {{Garbett, Shawn}},
organization = {Vanderbilt Biostatistics},
address = {Nashville, Tennessee},
year = {2023},
url = {https://cran.r-project.org/web/packages/redcapAPI/index.html},
}

@Article{tidyverse_cit,
title = {Welcome to the {tidyverse}},
author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
year = {2019},
journal = {Journal of Open Source Software},
volume = {4},
number = {43},
pages = {1686},
doi = {10.21105/joss.01686},
}
11 changes: 9 additions & 2 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,13 @@ bibliography: paper.bib

Capturing and storing electronic data is integral in the research world, yet often becomes a burden to the researchers themselves. [REDCap](https://www.project-redcap.org/) [@Harris2009; @Harris2019] alleviates this problem by offering a secure web application that lets users build databases and surveys with a robust front-end interface that can support data of any type, including data requiring compliance with standards for protected information.

For many researchers who use REDCap, the R language [@r_citation] is a powerful tool for extracting and analyzing their data. To take advantage of REDCap's REST API, the [`REDCapR`](https://cran.r-project.org/web/packages/REDCapR/index.html) and [`redcapAPI`](https://cran.r-project.org/web/packages/redcapAPI/index.html) packages allow R users to extract data directly into their programming environment. The default extraction structure for a given REDCap database is referred to as the "block matrix," and is a singular, unwieldy, and "untidy" data table. The concept of "[tidy data](https://www.jstatsoft.org/article/view/v059i10)" [@Wickham2014] describes a framework for standard mapping and structuring of data where each variable forms a column, each observation forms a row, and each type of observational unit forms a table. Fundamentally, the block Matrix breaks these tidy principles by obscuring the primary keys that identify individual records, leaving and analysts with the arduous task of reformatting the matrix for usability.
For many researchers who use REDCap, the R language [@r_citation] is a powerful tool for extracting and analyzing their data. To take advantage of REDCap's REST API, the [`REDCapR`](https://cran.r-project.org/web/packages/REDCapR/index.html) [@redcapr_cit] and [`redcapAPI`](https://cran.r-project.org/web/packages/redcapAPI/index.html) [@redcapapi_cit] packages allow R users to extract data directly into their programming environment. The default extraction structure for a given REDCap database is referred to as the "block matrix," and is a singular, unwieldy, and "untidy" data table. The concept of "[tidy data](https://www.jstatsoft.org/article/view/v059i10)" [@Wickham2014] describes a framework for standard mapping and structuring of data where each variable forms a column, each observation forms a row, and each type of observational unit forms a table. Fundamentally, the block matrix breaks these tidy principles by obscuring the primary keys that identify individual records, leaving analysts with the arduous task of reformatting the matrix for usability.

To address these challenges, we developed `REDCapTidieR` as an open source R package that transforms the standard REDCap output into a format that adheres to tidy data principles. `REDCapTidieR` has the potential to save organizations and research staff immeasurable amounts of time, allowing them to quickly query their data without the need for intricate data parsing processes.

# Statement of Need

As of 2023, the REDCap Consortium boasts nearly 3 million users across over 150 countries. REDCap databases exhibit significant variation in complexity, ranging from simple tables with easily identifiable records to more challenging scenarios where pinpointing a unique identifier is harder. This complexity often arises from the introduction of "repeating instruments" and "repeating events." For an in-depth exploration of this concept, refer to the [`REDCapTidieR` documentation](https://chop-cgtinformatics.github.io/REDCapTidieR/articles/diving_deeper.html#longitudinal-redcap-projects). Fundamentally, repeating events and instruments support studies like clinical trials, where subjects may have distinct timelines with varying levels of record granularity. This is where the flattening of the database into the block matrix becomes a pain point for analysts.
As of 2023, the REDCap Consortium boasts nearly 3 million users across over 150 countries. REDCap databases exhibit significant variation in complexity, ranging from simple tables with easily identifiable records to more challenging scenarios where pinpointing a unique identifier is harder. This complexity often arises in databases that make use of "repeating instruments" and "repeating events." For an in-depth exploration of this concept, refer to the [`REDCapTidieR` documentation](https://chop-cgtinformatics.github.io/REDCapTidieR/articles/diving_deeper.html#longitudinal-redcap-projects). Fundamentally, repeating events and instruments support longitudinal studies, where subjects may have distinct timelines with varying levels of record granularity. This is where the flattening of the database into the block matrix becomes a pain point for analysts.

While there are a few existing REDCap tools for R documented by [`REDCap-tools`](https://redcap-tools.github.io/projects/), `REDCapTidieR` occupies a unique space by providing analysts with an opinionated framework that quickly prepares them for data analysis. Although some of the aforementioned tools also offer functions for data processing, such as the [`tidyREDCap`](https://raymondbalise.github.io/tidyREDCap/) and [`REDCapDM`](https://ubidi.github.io/REDCapDM/index.html) packages, `REDCapTidieR` is unique in how it restructures the block matrix into a format that is easily interpretable within the user's programmatic environment. Of the tools available, `REDCapTidieR` is the only one that fundamentally restructures the block matrix in its entirety.

Expand All @@ -75,13 +75,20 @@ Transformation of the block matrix into a friendlier structure is carried out by
Unlike the block matrix, which combines all columns for record identification into one table, `REDCapTidieR` separates instruments so that only the variables necessary for identification of a record within the instrument are included in each data tibble. Below we provide a sample model that compares the standard output from a REDCap database with non-repeating and repeating instruments to one post-processed through `REDCapTidieR`.

![Conceptual Model](/paper/images/REDCapTidieR%20JOSS.png)
Figure 1: Comparative model showing REDCap API export formats between the default behavior and `REDCapTidieR`.

In this example, the supertibble displays three REDCap database instruments, with one repeating and two non-repeating. Below, one of each of these instrument types is expanded to show how `REDCapTidieR` separates these instruments into their own tabular list elements structured with only the identifiers necessary to pinpoint a specific record. This format makes tables easily joinable by analysts for whatever operations they may need later in their work.

# Installation

`REDCapTidieR` is available on [GitHub](https://github.com/CHOP-CGTInformatics/REDCapTidieR) and [CRAN](https://cran.r-project.org/web/packages/REDCapTidieR/index.html) and has been tested for functionality on all major operating systems.

# Acknowledgements

`REDCapTidieR` is made possible in large part thanks to the `REDCapR` and `tidyverse` [@tidyverse_cit] packages.

The authors would also like to give special thanks to Will Beasley, Paul Wildenhain, and Jan Marvin for their feedback and support in development.

# Conflict of interest

This package was developed by the [Children’s Hospital of
Expand Down

0 comments on commit 52156ad

Please sign in to comment.