diff --git a/R/rd_transform.R b/R/rd_transform.R
index 569f8a0..47d725c 100644
--- a/R/rd_transform.R
+++ b/R/rd_transform.R
@@ -2,7 +2,7 @@
#'
#' @description
#' `r lifecycle::badge('stable')`
-#' This function transforms the raw REDCap data read by the `redcap_data` function. It returns the transformed data and dictionary, along with a summary of the results of each step.
+#' This function transforms the raw REDCap data read by the `redcap_data` function. It runs in one-step pipeline all the functions dedicated to processing the data. It returns the transformed data and dictionary, along with a summary of the results of each step.
#'
#' @param project Output of the `redcap_data` function, which is a list containing the data frames of the data, dictionary and event_form (if needed) of the REDCap project.
#' @param data Data frame containing the data read from REDCap. If the list is specified, this argument is not necessary.
diff --git a/vignettes/REDCapDM.Rmd b/vignettes/REDCapDM.Rmd
index 7e414e7..72294df 100644
--- a/vignettes/REDCapDM.Rmd
+++ b/vignettes/REDCapDM.Rmd
@@ -38,8 +38,12 @@ The REDCapDM package provides a comprehensive toolkit for managing data exported
All main functions are listed below (and described in detail in the examples):
+## Import data
+
- `redcap_data()`: Read REDCap data into R.
+## Process data
+
- `rd_dates()`: Standardize date and datetime fields.
- `rd_delete_vars()`: Remove specified variables (by name or pattern).
@@ -58,8 +62,12 @@ All main functions are listed below (and described in detail in the examples):
- `rd_dictionary()`: Update dictionary (translation of REDCap logic into R syntax) to reflect transformed data and logic.
+Or we can use all these functions at once:
+
- `rd_transform()`: One-step pipeline to clean and preprocess the raw REDCap data.
+## Queries
+
- `rd_query()`: Apply expressions to identify data queries/issues.
- `rd_event()`: Report missing/incomplete events per record (longitudinal).
@@ -124,26 +132,26 @@ kable(vars) |>
-# **Examples**
+# **Usage**
-The package structure can be divided into three main components: reading raw data, processing data and identifying queries. Typically, after collecting data in REDCap, we will have to follow this three components in order to have a final validated dataset for analysis. We will provide a complete user guide on how to perform each one of these steps using the package's functions. For the processing of the data and query identification, we will use the built-in dataset as an example.
+The package structure can be divided into three main components: reading raw data, processing data and identifying queries. Typically, after collecting data in REDCap, we will have to follow this three components in order to have a final validated dataset for analysis. We will provide a complete basic user guide on how to perform each one of these steps using the package's functions. For the processing of the data and query identification, we will use the `covican` built-in dataset as an example.
## **Read data**
### **redcap_data**
-The `redcap_data()` function allows users to easily import data from a REDCap project into R for analysis.
+The `redcap_data()` function allows users to easily import data from a REDCap project into R.
-To read exported data from REDCap, use the arguments `data_path` and `dic_path` to, respectively, describe the path of the R file and the REDCap project's dictionary:
+In order to read exported data from REDCap, we first need to download the data and dictionary from the REDCap project in R format. We can then use the arguments `data_path` and `dic_path` to designate the local path where we have stored the R file and the dictionary from the REDCap project:
```{r message=FALSE, warning=FALSE, comment=NA, eval=FALSE}
dataset <- redcap_data(data_path = "C:/Users/username/example.r",
dic_path = "C:/Users/username/example_dictionary.csv")
```
-> Note: The R and CSV files exported from REDCap must be located in the same directory.
+> Note: The R and data CSV file exported from REDCap must be located in the same directory.
-If the REDCap project is longitudinal (contains more than one event) then a third element should be specified with the correspondence of each event with each form of the project. This csv file can be downloaded in the REDCap of the project following these steps: _Project Setup_ < _Designate Instruments for My Events_ < _Download instrument-event mappings (CSV)_.
+If the REDCap project is longitudinal (contains more than one event) then a third element should be specified with the correspondence of each event with each form of the project. This csv file can be downloaded in the REDCap of the project following these steps: _Project Setup_ < _Designate Instruments for My Events_ < _Download instrument-event mappings (CSV)_. Then, it has to be specified using the argument `event_path`:
```{r message=FALSE, warning=FALSE, comment=NA, eval=FALSE}
dataset <- redcap_data(data_path = "C:/Users/username/example.r",
@@ -164,52 +172,41 @@ In this case, there is no need to specify the event-form file since the function
> **Warning**: Please keep in mind that the API token gives you special access to the REDCap project and that it should not be shared with other people.
-This function returns a list with 3 elements (imported data, dictionary and event-form mapping) which can then be used for further analysis or visualization.
+The `redcap_data()` function returns a list with three elements: imported data, dictionary and event-form mapping(if included).
## **Process**
-As previously stated, we will use the built-in dataset `REDCapDM::covican` as an example.
-
-For all the following functions, the only necessary elements that must be provided are the dataset to be transformed and the corresponding dictionary. If the project is longitudinal, as in the case of `REDCapDM::covican`, also the event-form dataset should be specified. These elements can be specified directly using the output of the `redcap_data()` function or separately in different arguments.
-
-### **rd_dates**
-
-This function is designed to process and standardize `date` and `datetime` fields in a REDCap dataset. In REDCap projects, date and datetime fields can sometimes be stored as character strings, which can make analyses difficult. It detects which fields should be dates/datetimes from the REDCap dictionary and converts them to `Date` and `POSIXct`, respectively.
+Given any data imported from REDCap with `redcap_data()`, this would be the pipeline of an entire processing workflow:
-```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE}
-# Option A: list object
-covican_dates <- covican |> rd_dates()
-
-# Option B: provide components separately
-covican_dates <- rd_dates(data = covican$data,
- dic = covican$dictionary,
- event_form = covican$event_form)
+```{r message=FALSE, warning=FALSE, eval = FALSE, comment="#>", collapse = TRUE}
+data |>
+ rd_delete_vars(delete_pattern = c("_complete", "_timestamp") |>
+ rd_dates() |>
+ rd_recalculate() |>
+ rd_checkbox() |>
+ rd_factor() |>
+ rd_dictionary() |>
+ rd_split(by = "event") # use "form" if not longitudinal
```
-Quick verification example:
+All functions are optional and should only be used at the user's discretion when necessary. The order of some functions can also be exchanged. For example, for `covican` there are no variables to delete and dates are already processed, so the pipeline would be simplified:
```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE}
-# Simulate a character date since covican already has the dates in the correct format
-covican_dates <- covican
-covican_dates$data <- covican_dates$data |>
- dplyr::mutate(d_birth = as.character(d_birth))
-# Check class before conversion
+covican_transformed <- covican |>
+ rd_recalculate() |>
+ rd_checkbox() |>
+ rd_factor() |>
+ rd_dictionary() |>
+ rd_split(by = "event")
-class(covican_dates$data$d_birth)
-
-# Check class after conversion
-covican_dates <- covican_dates |> rd_dates()
-class(covican_dates$data$d_birth)
+covican_transformed$results
```
-After this transformation, all `date` and `datetime` variables are standardized and ready for analysis in R.
-
-
+All the functions that can be used in each step of a processing workflow are detailed below:
### **rd_delete_vars**
-This function removes unwanted variables from both a REDCap dataset and its dictionary, keeping the data and metadata consistent.
-This is especially useful for eliminating automatically generated fields such as form completion flags (`*_complete`) or timestamps (`*_timestamp`).
+This function removes unwanted variables from both a REDCap dataset and its dictionary. This is especially useful for eliminating automatically generated fields such as form completion flags (`*_complete`) or timestamps (`*_timestamp`).
You can delete variables either by specifying their exact names or by using regular expression patterns:
@@ -231,10 +228,42 @@ When variables are deleted:
+
+### **rd_dates**
+
+This function is designed to process and standardize `date` and `datetime` fields in a REDCap dataset. In REDCap projects, date and datetime fields can sometimes be stored as character strings, which can make analyses difficult. It detects which fields should be dates/datetimes from the REDCap dictionary and converts them to `Date` and `POSIXct`, respectively.
+
+```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE}
+covican_dates <- covican |>
+ rd_dates()
+```
+
+Quick verification example:
+
+```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE}
+# Simulate a character date since covican already has the dates in the correct format
+covican_dates <- covican
+covican_dates$data <- covican_dates$data |>
+ dplyr::mutate(d_birth = as.character(d_birth))
+# Check class before conversion
+
+class(covican_dates$data$d_birth)
+
+# Check class after conversion
+covican_dates <- covican_dates |>
+ rd_dates()
+class(covican_dates$data$d_birth)
+```
+
+After this transformation, all `date` and `datetime` variables are standardized and ready for analysis in R.
+
+
+
### **rd_recalculate**
This function identifies calculated fields in a REDCap project, translates their logic into R, recalculates them, and compares the recalculated values with the originals.
-It produces both field-level and project-level reports, helping users detect discrepancies between REDCap’s stored calculations and the values recomputed in R.
+
+It produces a report, helping users detect discrepancies between REDCap’s stored calculations and the values recomputed in R.
```{r}
covican_recalc <- covican |>
@@ -244,16 +273,15 @@ covican_recalc <- covican |>
covican_recalc$results
```
-The `results` object contains:
-
-- Summary report – total number of calculated fields, how many were successfully transcribed into R logic, and how many recalculated values differ from the originals.
+_TODO: No pondría numeración cuando solo se corre esta función._
-- Field-level report – lists each calculated field, whether its logic was transcribed, and whether the recalculated value matches the original.
+The `results` object contains:
-Example: excluding specific variables
+- Summary report: total number of calculated fields, how many were successfully transcribed into R logic, and how many recalculated values differ from the originals.
-You can exclude certain fields from recalculation (e.g., complex multi-event calculations) to reduce computation time and avoid unnecessary warnings.
+- Field-level report: lists each calculated field, whether its logic was transcribed, and whether the recalculated value matches the original.
+IN addition, you can exclude certain fields from recalculation (e.g., complex multi-event calculations) to reduce computation time and avoid unnecessary warnings.
```{r}
# Exclude specific variables from recalculation
@@ -263,48 +291,21 @@ covican_recalc <- covican |>
covican_recalc$results
```
+_TODO: Cambiaria el nombre del argumento 'exclude_recalc' por 'exclude'_
+
When recalculation succeeds:
- A new variable is added to the dataset with the suffix `_recalc.`
- A corresponding entry is added to the dictionary with the label `". (Recalculate)"`.
-
-
-### **rd_factor**
-
-This function converts categorical variables in a REDCap dataset into R factors.
-It detects `.factor` columns (created by REDCap for multiple-choice fields) and merges them into the original variables, while preserving labels and updating the dictionary’s branching logic.
-
-```{r}
-factored <- covican |>
- rd_factor()
-
-# Checking class of the variable
-str(factored$data$available_analytics)
-```
-
-You can prevent certain variables from being converted to factors using the `exclude` argument.
-This is useful if you need to keep some variables as raw numeric or text data.
-
-```{r}
-factored <- covican |>
- rd_factor(exclude = c("available_analytics", "urine_culture"))
-
-# Checking class of the variable
-str(covican$data$available_analytics)
-```
-
-> Note: the function automatically excludes these system variables from conversion: `redcap_event_name`, `redcap_repeat_instrument`, `redcap_data_access_group`. These variables are retained as-is to avoid interfering with longitudinal event mappings or user access groups.
-
-After conversion, original variables are replaced with proper R factor columns and their `.factor` counterparts are dropped.
+_TODO: Valorar si las pondrías al final del diccionario o después de cada calc correspondiente_
### **rd_checkbox**
-This function processes REDCap checkbox fields, converting them from "Checked"/"Unchecked" categories into binary-coded variables (0/1) with user-specified labels.
-It can also rename variables to match checkbox option labels and updates dictionary branching logic accordingly.
+This function processes REDCap checkbox fields, converting them from "Checked"/"Unchecked" categories into binary-coded variables (0/1) and its corresponding factor variable with user-specified labels. It also renames variables to match checkbox option labels and updates dictionary branching logic accordingly.
```{r}
# Default transformation: "No"/"Yes" labels, renamed variables
@@ -314,11 +315,17 @@ cb <- covican |>
str(cb$data$underlying_disease_hemato_acute_myeloid_leukemia)
```
+If a branching logic exists for a checkbox field, the function attempts to translate it into R, by default. When `checkbox_na = TRUE`, values outside the branching logic are set to NA. A summary of problematic fields (e.g., missing branching logic or logic not transcribable) is included in the results element:
+
+```{r}
+cb$results
+```
+
You can specify alternative labels:
```{r}
cb <- covican |>
- rd_checkbox(, checkbox_labels = c("Absent", "Present"))
+ rd_checkbox(checkbox_labels = c("Absent", "Present"))
str(cb$data$underlying_disease_hemato_acute_myeloid_leukemia)
```
@@ -332,112 +339,43 @@ cb <- covican |>
str(cb$data$underlying_disease_hemato___1)
```
-> Note: If a branching logic exists for a checkbox field, the function attempts to translate it into R, by default. When `checkbox_na = TRUE`, values outside the branching logic are set to NA.
-
-
-A summary of problematic fields (e.g., missing branching logic or logic not transcribable) is included in the results element:
-
-```{r}
-cb$results
-```
-
-### **rd_split**
-
-After preparing your dataset with, you may want to work with only one form or one event at a time. The `rd_split()` function separates your dataset accordingly.
-
-- **By form**
-
-For non-longitudinal projects (or longitudinal projects with an `event_form` mapping), you can split the dataset into smaller datasets based on forms. If repeated entries exist, you can reshape the data into wide format:
-
-> Note: For proper use of this function, ensure that `rd_factor()` and `rd_checkbox()` have been applied to your dataset. If not, the function will emit an error and prompt you to run both functions first.
-
-```{r}
-forms_data <- covican |>
- rd_factor() |>
- rd_checkbox() |>
- rd_split(by = "form")
-forms_data$data
-```
+### **rd_factor**
-- **By event**
+This function converts categorical variables in a REDCap dataset into R factors. It replaces categorical columns with the corresponding `.factor` column (created by REDCap for multiple-choice fields). It also updates the branching logic of the dictionary.
-For longitudinal projects, you can split by event instead. The function uses the `event_form` mapping to assign variables correctly to each event:
+_TODO: Ojo, actualiza solo el branching logic o también el calculated? Lo mismo para las siguientes funciones. João ha mentido, no hace nada!! Quitarlo de todos los sitios en los que se dice (incluso documentación de las funciones)._
```{r}
-events_data <- covican |>
- rd_factor() |>
- rd_checkbox() |>
- rd_split(by = "event")
+factored <- covican |>
+ rd_factor()
-events_data$data
+# Checking class of the variable
+str(factored$data$available_analytics)
```
-If you want to extract only one form or event, use the `which` argument:
+You can prevent certain variables from being converted to factors using the `exclude` argument.
+This is useful if you need to keep some variables as raw numeric or text data.
```{r}
-# Example by form
-baseline_data <- covican |>
- rd_factor() |>
- rd_checkbox() |>
- rd_split(by = "form", which = "demographics")
-
-# Checking the names of the variables collected in that form
-vars_demo <- covican$dictionary |>
- dplyr::filter(form_name == "demographics") |>
- dplyr::pull(field_name)
-
-all(vars_demo %in% names(baseline_data$data))
-```
-
-
-
-### **rd_insert_na**
-
-This function sets some values of a variable to missing if a certain logic is fulfilled. It can be used as a complementary function for `rd_transform()`, for example, to change the values of those checkboxes that do not have a branching logic, as mentioned earlier. For instance, we can perform a raw transformation of our data, as in section 4.2.1.1, and then use this function to set the values of the checkbox _type_underlying_disease_haematological_cancer_ to missing when the age is less than 65 years old:
-
-```{r message=FALSE, warning=FALSE, comment=NA}
-#Raw transformation of the data:
-dataset <- rd_transform(covican)
-
-data <- dataset$data
-
-#Before inserting missings
-table(data$type_underlying_disease_haematological_cancer)
-
-#Run the function
-data2 <- rd_insert_na(dataset,
- event_form = covican$event_form,
- vars = "type_underlying_disease_haematological_cancer",
- filter = "age < 65")
+factored <- covican |>
+ rd_factor(exclude = c("available_analytics", "urine_culture"))
-#After inserting missings
-table(data2$type_underlying_disease_haematological_cancer)
+# Checking class of the variable
+str(covican$data$available_analytics)
```
-Recall that both the variable to be transformed (_age_) and the variable included in the filter (_type_underlying_disease_haematological_cancer_) are in the same event. In the contrary, if the variable to be transformed and the filter didn't have any event in common then the transformation would give an error. Furthermore, if the variable to be transformed was in more events than the filter, only the rows of the events in common would be converted.
-
-
-
-### **rd_rlogic**
-
-This function transforms the REDCap logic into logic that can be evaluated in R. It returns both the transformed logic and the result of the evaluation of the logic. This function is used in the `rd_transform()` to recalculate the calculated fields and convert the branching logics, but it may also be useful to use it in other circunstances. Let's see how it transforms the logic of one of the calculated fields in the built-in dataset:
-
-```{r message=FALSE, warning=FALSE, comment=NA}
-logic_trans <- rd_rlogic(covican,
- logic = "if([exc_1]='1' or [inc_1]='0' or [inc_2]='0' or [inc_3]='0',1,0)",
- var = "screening_fail_crit")
+> Note: the function automatically excludes these system variables from conversion: `redcap_event_name`, `redcap_repeat_instrument`, `redcap_data_access_group`. These variables are retained as-is to avoid interfering with longitudinal event mappings or user access groups.
-str(logic_trans)
-```
+After conversion, original variables are replaced with proper R factor columns and their `.factor` counterparts are dropped.
### **rd_dictionary**
-When working with REDCap exports, the data dictionary contains field metadata, branching logic, and calculation rules written in REDCap logic. After cleaning your dataset with functions like `rd_factor()` and `rd_checkbox()`, the original dictionary may no longer match the transformed data. The rd_dictionary() function refreshes branching logic and calculations, translating them from REDCap logic into R logic, and ensures the dictionary remains consistent with the cleaned dataset.
+When working with REDCap exports, the data dictionary contains field metadata, branching logic, and calculation rules written in REDCap logic. The `rd_dictionary()` function refreshes branching logic and calculations, translating them from REDCap logic into R logic, and ensures the dictionary remains consistent with the cleaned dataset.
```{r}
# Update dictionary after cleaning
@@ -445,9 +383,6 @@ dict_result <- covican |>
rd_factor() |>
rd_checkbox() |>
rd_dictionary()
-
-# Review any branching logic issues
-dict_result$results
```
When we transform the dictionary:
@@ -460,198 +395,114 @@ When we transform the dictionary:
-### **rd_transform**
-
-The main function involved in the processing of the data is `rd_transform()`. This function is used to process the REDCap data read into R using the `redcap_data()`, as described above. Using the arguments of the function we can perform all the different type of transformations described until now. Its a one-step transformation function!
-
-#### *Data transformation*
-
-```{r message=FALSE, warning=FALSE, comment=NA}
-#Option A: list object
-covican_transformed <- rd_transform(covican)
-
-#Option B: separately with different arguments
-covican_transformed <- rd_transform(data = covican$data,
- dic = covican$dictionary,
- event_form = covican$event_form)
-
-#Print the results of the transformation
-covican_transformed$results
-```
-
-This function will return a list with the transformed dataset, dictionary, event_form and the output of the results of the transformation.
-
-As we can see, there are several steps in the transformation:
-
-