diff --git a/R/rd_transform.R b/R/rd_transform.R index 569f8a0..47d725c 100644 --- a/R/rd_transform.R +++ b/R/rd_transform.R @@ -2,7 +2,7 @@ #' #' @description #' `r lifecycle::badge('stable')` -#' This function transforms the raw REDCap data read by the `redcap_data` function. It returns the transformed data and dictionary, along with a summary of the results of each step. +#' This function transforms the raw REDCap data read by the `redcap_data` function. It runs in one-step pipeline all the functions dedicated to processing the data. It returns the transformed data and dictionary, along with a summary of the results of each step. #' #' @param project Output of the `redcap_data` function, which is a list containing the data frames of the data, dictionary and event_form (if needed) of the REDCap project. #' @param data Data frame containing the data read from REDCap. If the list is specified, this argument is not necessary. diff --git a/vignettes/REDCapDM.Rmd b/vignettes/REDCapDM.Rmd index 7e414e7..72294df 100644 --- a/vignettes/REDCapDM.Rmd +++ b/vignettes/REDCapDM.Rmd @@ -38,8 +38,12 @@ The REDCapDM package provides a comprehensive toolkit for managing data exported All main functions are listed below (and described in detail in the examples): +## Import data + - `redcap_data()`: Read REDCap data into R. +## Process data + - `rd_dates()`: Standardize date and datetime fields. - `rd_delete_vars()`: Remove specified variables (by name or pattern). @@ -58,8 +62,12 @@ All main functions are listed below (and described in detail in the examples): - `rd_dictionary()`: Update dictionary (translation of REDCap logic into R syntax) to reflect transformed data and logic. +Or we can use all these functions at once: + - `rd_transform()`: One-step pipeline to clean and preprocess the raw REDCap data. +## Queries + - `rd_query()`: Apply expressions to identify data queries/issues. - `rd_event()`: Report missing/incomplete events per record (longitudinal). @@ -124,26 +132,26 @@ kable(vars) |>
-# **Examples** +# **Usage** -The package structure can be divided into three main components: reading raw data, processing data and identifying queries. Typically, after collecting data in REDCap, we will have to follow this three components in order to have a final validated dataset for analysis. We will provide a complete user guide on how to perform each one of these steps using the package's functions. For the processing of the data and query identification, we will use the built-in dataset as an example. +The package structure can be divided into three main components: reading raw data, processing data and identifying queries. Typically, after collecting data in REDCap, we will have to follow this three components in order to have a final validated dataset for analysis. We will provide a complete basic user guide on how to perform each one of these steps using the package's functions. For the processing of the data and query identification, we will use the `covican` built-in dataset as an example. ## **Read data** ### **redcap_data** -The `redcap_data()` function allows users to easily import data from a REDCap project into R for analysis. +The `redcap_data()` function allows users to easily import data from a REDCap project into R. -To read exported data from REDCap, use the arguments `data_path` and `dic_path` to, respectively, describe the path of the R file and the REDCap project's dictionary: +In order to read exported data from REDCap, we first need to download the data and dictionary from the REDCap project in R format. We can then use the arguments `data_path` and `dic_path` to designate the local path where we have stored the R file and the dictionary from the REDCap project: ```{r message=FALSE, warning=FALSE, comment=NA, eval=FALSE} dataset <- redcap_data(data_path = "C:/Users/username/example.r", dic_path = "C:/Users/username/example_dictionary.csv") ``` -> Note: The R and CSV files exported from REDCap must be located in the same directory. +> Note: The R and data CSV file exported from REDCap must be located in the same directory. -If the REDCap project is longitudinal (contains more than one event) then a third element should be specified with the correspondence of each event with each form of the project. This csv file can be downloaded in the REDCap of the project following these steps: _Project Setup_ < _Designate Instruments for My Events_ < _Download instrument-event mappings (CSV)_. +If the REDCap project is longitudinal (contains more than one event) then a third element should be specified with the correspondence of each event with each form of the project. This csv file can be downloaded in the REDCap of the project following these steps: _Project Setup_ < _Designate Instruments for My Events_ < _Download instrument-event mappings (CSV)_. Then, it has to be specified using the argument `event_path`: ```{r message=FALSE, warning=FALSE, comment=NA, eval=FALSE} dataset <- redcap_data(data_path = "C:/Users/username/example.r", @@ -164,52 +172,41 @@ In this case, there is no need to specify the event-form file since the function > **Warning**: Please keep in mind that the API token gives you special access to the REDCap project and that it should not be shared with other people. -This function returns a list with 3 elements (imported data, dictionary and event-form mapping) which can then be used for further analysis or visualization. +The `redcap_data()` function returns a list with three elements: imported data, dictionary and event-form mapping(if included). ## **Process** -As previously stated, we will use the built-in dataset `REDCapDM::covican` as an example. - -For all the following functions, the only necessary elements that must be provided are the dataset to be transformed and the corresponding dictionary. If the project is longitudinal, as in the case of `REDCapDM::covican`, also the event-form dataset should be specified. These elements can be specified directly using the output of the `redcap_data()` function or separately in different arguments. - -### **rd_dates** - -This function is designed to process and standardize `date` and `datetime` fields in a REDCap dataset. In REDCap projects, date and datetime fields can sometimes be stored as character strings, which can make analyses difficult. It detects which fields should be dates/datetimes from the REDCap dictionary and converts them to `Date` and `POSIXct`, respectively. +Given any data imported from REDCap with `redcap_data()`, this would be the pipeline of an entire processing workflow: -```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE} -# Option A: list object -covican_dates <- covican |> rd_dates() - -# Option B: provide components separately -covican_dates <- rd_dates(data = covican$data, - dic = covican$dictionary, - event_form = covican$event_form) +```{r message=FALSE, warning=FALSE, eval = FALSE, comment="#>", collapse = TRUE} +data |> + rd_delete_vars(delete_pattern = c("_complete", "_timestamp") |> + rd_dates() |> + rd_recalculate() |> + rd_checkbox() |> + rd_factor() |> + rd_dictionary() |> + rd_split(by = "event") # use "form" if not longitudinal ``` -Quick verification example: +All functions are optional and should only be used at the user's discretion when necessary. The order of some functions can also be exchanged. For example, for `covican` there are no variables to delete and dates are already processed, so the pipeline would be simplified: ```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE} -# Simulate a character date since covican already has the dates in the correct format -covican_dates <- covican -covican_dates$data <- covican_dates$data |> - dplyr::mutate(d_birth = as.character(d_birth)) -# Check class before conversion +covican_transformed <- covican |> + rd_recalculate() |> + rd_checkbox() |> + rd_factor() |> + rd_dictionary() |> + rd_split(by = "event") -class(covican_dates$data$d_birth) - -# Check class after conversion -covican_dates <- covican_dates |> rd_dates() -class(covican_dates$data$d_birth) +covican_transformed$results ``` -After this transformation, all `date` and `datetime` variables are standardized and ready for analysis in R. - -
+All the functions that can be used in each step of a processing workflow are detailed below: ### **rd_delete_vars** -This function removes unwanted variables from both a REDCap dataset and its dictionary, keeping the data and metadata consistent. -This is especially useful for eliminating automatically generated fields such as form completion flags (`*_complete`) or timestamps (`*_timestamp`). +This function removes unwanted variables from both a REDCap dataset and its dictionary. This is especially useful for eliminating automatically generated fields such as form completion flags (`*_complete`) or timestamps (`*_timestamp`). You can delete variables either by specifying their exact names or by using regular expression patterns: @@ -231,10 +228,42 @@ When variables are deleted:
+ +### **rd_dates** + +This function is designed to process and standardize `date` and `datetime` fields in a REDCap dataset. In REDCap projects, date and datetime fields can sometimes be stored as character strings, which can make analyses difficult. It detects which fields should be dates/datetimes from the REDCap dictionary and converts them to `Date` and `POSIXct`, respectively. + +```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE} +covican_dates <- covican |> + rd_dates() +``` + +Quick verification example: + +```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE} +# Simulate a character date since covican already has the dates in the correct format +covican_dates <- covican +covican_dates$data <- covican_dates$data |> + dplyr::mutate(d_birth = as.character(d_birth)) +# Check class before conversion + +class(covican_dates$data$d_birth) + +# Check class after conversion +covican_dates <- covican_dates |> + rd_dates() +class(covican_dates$data$d_birth) +``` + +After this transformation, all `date` and `datetime` variables are standardized and ready for analysis in R. + +
+ ### **rd_recalculate** This function identifies calculated fields in a REDCap project, translates their logic into R, recalculates them, and compares the recalculated values with the originals. -It produces both field-level and project-level reports, helping users detect discrepancies between REDCap’s stored calculations and the values recomputed in R. + +It produces a report, helping users detect discrepancies between REDCap’s stored calculations and the values recomputed in R. ```{r} covican_recalc <- covican |> @@ -244,16 +273,15 @@ covican_recalc <- covican |> covican_recalc$results ``` -The `results` object contains: - -- Summary report – total number of calculated fields, how many were successfully transcribed into R logic, and how many recalculated values differ from the originals. +_TODO: No pondría numeración cuando solo se corre esta función._ -- Field-level report – lists each calculated field, whether its logic was transcribed, and whether the recalculated value matches the original. +The `results` object contains: -Example: excluding specific variables +- Summary report: total number of calculated fields, how many were successfully transcribed into R logic, and how many recalculated values differ from the originals. -You can exclude certain fields from recalculation (e.g., complex multi-event calculations) to reduce computation time and avoid unnecessary warnings. +- Field-level report: lists each calculated field, whether its logic was transcribed, and whether the recalculated value matches the original. +IN addition, you can exclude certain fields from recalculation (e.g., complex multi-event calculations) to reduce computation time and avoid unnecessary warnings. ```{r} # Exclude specific variables from recalculation @@ -263,48 +291,21 @@ covican_recalc <- covican |> covican_recalc$results ``` +_TODO: Cambiaria el nombre del argumento 'exclude_recalc' por 'exclude'_ + When recalculation succeeds: - A new variable is added to the dataset with the suffix `_recalc.` - A corresponding entry is added to the dictionary with the label `". (Recalculate)"`. -
- -### **rd_factor** - -This function converts categorical variables in a REDCap dataset into R factors. -It detects `.factor` columns (created by REDCap for multiple-choice fields) and merges them into the original variables, while preserving labels and updating the dictionary’s branching logic. - -```{r} -factored <- covican |> - rd_factor() - -# Checking class of the variable -str(factored$data$available_analytics) -``` - -You can prevent certain variables from being converted to factors using the `exclude` argument. -This is useful if you need to keep some variables as raw numeric or text data. - -```{r} -factored <- covican |> - rd_factor(exclude = c("available_analytics", "urine_culture")) - -# Checking class of the variable -str(covican$data$available_analytics) -``` - -> Note: the function automatically excludes these system variables from conversion: `redcap_event_name`, `redcap_repeat_instrument`, `redcap_data_access_group`. These variables are retained as-is to avoid interfering with longitudinal event mappings or user access groups. - -After conversion, original variables are replaced with proper R factor columns and their `.factor` counterparts are dropped. +_TODO: Valorar si las pondrías al final del diccionario o después de cada calc correspondiente_
### **rd_checkbox** -This function processes REDCap checkbox fields, converting them from "Checked"/"Unchecked" categories into binary-coded variables (0/1) with user-specified labels. -It can also rename variables to match checkbox option labels and updates dictionary branching logic accordingly. +This function processes REDCap checkbox fields, converting them from "Checked"/"Unchecked" categories into binary-coded variables (0/1) and its corresponding factor variable with user-specified labels. It also renames variables to match checkbox option labels and updates dictionary branching logic accordingly. ```{r} # Default transformation: "No"/"Yes" labels, renamed variables @@ -314,11 +315,17 @@ cb <- covican |> str(cb$data$underlying_disease_hemato_acute_myeloid_leukemia) ``` +If a branching logic exists for a checkbox field, the function attempts to translate it into R, by default. When `checkbox_na = TRUE`, values outside the branching logic are set to NA. A summary of problematic fields (e.g., missing branching logic or logic not transcribable) is included in the results element: + +```{r} +cb$results +``` + You can specify alternative labels: ```{r} cb <- covican |> - rd_checkbox(, checkbox_labels = c("Absent", "Present")) + rd_checkbox(checkbox_labels = c("Absent", "Present")) str(cb$data$underlying_disease_hemato_acute_myeloid_leukemia) ``` @@ -332,112 +339,43 @@ cb <- covican |> str(cb$data$underlying_disease_hemato___1) ``` -> Note: If a branching logic exists for a checkbox field, the function attempts to translate it into R, by default. When `checkbox_na = TRUE`, values outside the branching logic are set to NA. - - -A summary of problematic fields (e.g., missing branching logic or logic not transcribable) is included in the results element: - -```{r} -cb$results -``` -
-### **rd_split** - -After preparing your dataset with, you may want to work with only one form or one event at a time. The `rd_split()` function separates your dataset accordingly. - -- **By form** - -For non-longitudinal projects (or longitudinal projects with an `event_form` mapping), you can split the dataset into smaller datasets based on forms. If repeated entries exist, you can reshape the data into wide format: - -> Note: For proper use of this function, ensure that `rd_factor()` and `rd_checkbox()` have been applied to your dataset. If not, the function will emit an error and prompt you to run both functions first. - -```{r} -forms_data <- covican |> - rd_factor() |> - rd_checkbox() |> - rd_split(by = "form") -forms_data$data -``` +### **rd_factor** -- **By event** +This function converts categorical variables in a REDCap dataset into R factors. It replaces categorical columns with the corresponding `.factor` column (created by REDCap for multiple-choice fields). It also updates the branching logic of the dictionary. -For longitudinal projects, you can split by event instead. The function uses the `event_form` mapping to assign variables correctly to each event: +_TODO: Ojo, actualiza solo el branching logic o también el calculated? Lo mismo para las siguientes funciones. João ha mentido, no hace nada!! Quitarlo de todos los sitios en los que se dice (incluso documentación de las funciones)._ ```{r} -events_data <- covican |> - rd_factor() |> - rd_checkbox() |> - rd_split(by = "event") +factored <- covican |> + rd_factor() -events_data$data +# Checking class of the variable +str(factored$data$available_analytics) ``` -If you want to extract only one form or event, use the `which` argument: +You can prevent certain variables from being converted to factors using the `exclude` argument. +This is useful if you need to keep some variables as raw numeric or text data. ```{r} -# Example by form -baseline_data <- covican |> - rd_factor() |> - rd_checkbox() |> - rd_split(by = "form", which = "demographics") - -# Checking the names of the variables collected in that form -vars_demo <- covican$dictionary |> - dplyr::filter(form_name == "demographics") |> - dplyr::pull(field_name) - -all(vars_demo %in% names(baseline_data$data)) -``` - -
- -### **rd_insert_na** - -This function sets some values of a variable to missing if a certain logic is fulfilled. It can be used as a complementary function for `rd_transform()`, for example, to change the values of those checkboxes that do not have a branching logic, as mentioned earlier. For instance, we can perform a raw transformation of our data, as in section 4.2.1.1, and then use this function to set the values of the checkbox _type_underlying_disease_haematological_cancer_ to missing when the age is less than 65 years old: - -```{r message=FALSE, warning=FALSE, comment=NA} -#Raw transformation of the data: -dataset <- rd_transform(covican) - -data <- dataset$data - -#Before inserting missings -table(data$type_underlying_disease_haematological_cancer) - -#Run the function -data2 <- rd_insert_na(dataset, - event_form = covican$event_form, - vars = "type_underlying_disease_haematological_cancer", - filter = "age < 65") +factored <- covican |> + rd_factor(exclude = c("available_analytics", "urine_culture")) -#After inserting missings -table(data2$type_underlying_disease_haematological_cancer) +# Checking class of the variable +str(covican$data$available_analytics) ``` -Recall that both the variable to be transformed (_age_) and the variable included in the filter (_type_underlying_disease_haematological_cancer_) are in the same event. In the contrary, if the variable to be transformed and the filter didn't have any event in common then the transformation would give an error. Furthermore, if the variable to be transformed was in more events than the filter, only the rows of the events in common would be converted. - -
- -### **rd_rlogic** - -This function transforms the REDCap logic into logic that can be evaluated in R. It returns both the transformed logic and the result of the evaluation of the logic. This function is used in the `rd_transform()` to recalculate the calculated fields and convert the branching logics, but it may also be useful to use it in other circunstances. Let's see how it transforms the logic of one of the calculated fields in the built-in dataset: - -```{r message=FALSE, warning=FALSE, comment=NA} -logic_trans <- rd_rlogic(covican, - logic = "if([exc_1]='1' or [inc_1]='0' or [inc_2]='0' or [inc_3]='0',1,0)", - var = "screening_fail_crit") +> Note: the function automatically excludes these system variables from conversion: `redcap_event_name`, `redcap_repeat_instrument`, `redcap_data_access_group`. These variables are retained as-is to avoid interfering with longitudinal event mappings or user access groups. -str(logic_trans) -``` +After conversion, original variables are replaced with proper R factor columns and their `.factor` counterparts are dropped.
### **rd_dictionary** -When working with REDCap exports, the data dictionary contains field metadata, branching logic, and calculation rules written in REDCap logic. After cleaning your dataset with functions like `rd_factor()` and `rd_checkbox()`, the original dictionary may no longer match the transformed data. The rd_dictionary() function refreshes branching logic and calculations, translating them from REDCap logic into R logic, and ensures the dictionary remains consistent with the cleaned dataset. +When working with REDCap exports, the data dictionary contains field metadata, branching logic, and calculation rules written in REDCap logic. The `rd_dictionary()` function refreshes branching logic and calculations, translating them from REDCap logic into R logic, and ensures the dictionary remains consistent with the cleaned dataset. ```{r} # Update dictionary after cleaning @@ -445,9 +383,6 @@ dict_result <- covican |> rd_factor() |> rd_checkbox() |> rd_dictionary() - -# Review any branching logic issues -dict_result$results ``` When we transform the dictionary: @@ -460,198 +395,114 @@ When we transform the dictionary:
-### **rd_transform** - -The main function involved in the processing of the data is `rd_transform()`. This function is used to process the REDCap data read into R using the `redcap_data()`, as described above. Using the arguments of the function we can perform all the different type of transformations described until now. Its a one-step transformation function! - -#### *Data transformation* - -```{r message=FALSE, warning=FALSE, comment=NA} -#Option A: list object -covican_transformed <- rd_transform(covican) - -#Option B: separately with different arguments -covican_transformed <- rd_transform(data = covican$data, - dic = covican$dictionary, - event_form = covican$event_form) - -#Print the results of the transformation -covican_transformed$results -``` - -This function will return a list with the transformed dataset, dictionary, event_form and the output of the results of the transformation. - -As we can see, there are several steps in the transformation: - -
    -
  1. Elimination of variables: we can specify any variable in the dataset which we want to remove using the argument `delete_vars`, as explained later.
  2. - -
  3. Elimination of variables containing some pattern: by default, the pattern that the function looks for is '_complete' and '_timestamp'. We can specify any other pattern using the argument `delete_pattern`, as explained later.
  4. - -In this case, we do not have any variable with the pattern '_complete' and '_timestamp' since the built-in dataset only contains a sample of the variables of the project. All REDCap projects, when downloaded, contain one variable with the pattern '_complete' for each form indicating if the form has been marked as incomplete/unverified/completed. Also, if the project contains some survey then variables ending with '_timestamp' are also generated automatically. In general, we do not need this information so these variables are removed by default. - -
  5. Recalculation of REDCap calculated fields: it finds all the calculated fields and recalculates them using the REDCap logic specified in the calculation field translated into R. The recalculated variable is saved as the original name adding '_recalc' at the end. It can happen that the logic found contains some specific smart-variables or other complex structures which the function is not able to transcribe. With the summary found in `results` we can see how many calculated fields have been found, if they have been transcribed and, if that is the case, if the recalculated variable is equal to the original one.
  6. - -> Note: If the REDCap project is longitudinal and the event-form is not specified, this step will not be executed. - -In the example, we can see how there are two REDCap calculated fields, both have been transcribed successfully and the recalculation of the age does not match the original calculated variable from REDCap. - -
  7. Checkbox transformation: by default, it changes the names of the checkboxes to the name of its corresponding option and the name of their labels to 'No/Yes'. If we want to specify another pair of label names we can specify them using the `checkbox_labels` argument as we will see. Furthermore, if the checkbox contains a branching logic, when this logic evaluated returns a missing value (some of the variables specified in it are missing) the values of the checkbox will be set to missing.
  8. - -> Note: If the REDCap project is longitudinal and the event-form is not specified, the evaluations of the branching logic will not be done. - -For example, let's explain the transformation that undergo the variables corresponding to the checkbox field of the type of underlying disease. The variables were named originally as _type_underlying_disease___0_ and _type_underlying_disease___1_ although the name of the options are 'Haematological cancer' and 'Solid tumour'. Thus, in the transformed dataset, the names are converted to _type_underlying_disease_haematological_cancer_ and _type_underlying_disease_solid_tumour_. Then, since this checkbox variable does not have a branching logic, the variable is advised to be reviewed by the user in the `results`, as seen above. When reviewed we could use an additional function `rd_insert_na()` to insert the necessary missing values into this variable, as we will explain later. If a branching logic was found for this variable, `rd_transform` will insert automatically the missing values when the logic is not satisfied and no further transformation will be needed. - - -
  9. Replacement of the original variable by its factor version: REDCap creates two versions of the variables in the dataset for multiple-choice fields: a numerical one with the number that corresponds to each category and a factor one containing the labels of each category. In this step, we will replace the original variables with their factor versions, except for _redcap_event_name_ and _redcap_data_access_group_, for which we will keep both versions. We can specify other variables that we do not want to transform to factor using the argument `exclude_to_factor` which we will see later.
  10. - -
  11. Transformation of the branching logic: by default, every branching logic contained in the dictionary is presented in REDCap logic. In this step, we will convert each branching logic into R logic in order to apply this information when needed. For example, we will use it to properly identify missing values in variables with a branching logic, as we will see later in the vignette.
  12. - +### **rd_split** -
+After preparing your dataset, you may want to work with only one form or one event at a time. The `rd_split()` function separates your dataset accordingly. -#### *Data transformation and classification by event* +> Note: For proper use of these functions, ensure that `rd_factor()` and `rd_checkbox()` have been applied to your dataset. If not, the function will emit an error and prompt you to run both functions first. -Additionally, we can change the final structure of the transformed dataset. For example, we can split it by each event. This can be done by specifying in the `final_format` argument that we want our data to be split by event. +_TODO: valorar la posibilidad de aplicar el split en el caso que no se hayan aplicado estas funciones_ -```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - final_format = "by_event") +- **By form** -#To print the results -dataset$results -``` +For non-longitudinal projects (or longitudinal projects with an `event_form` mapping), you can split the dataset into smaller datasets based on forms. If repeated entries exist, you can reshape the data into wide format: -Now, a final step in the transformation has been added, which consists in splitting the data according to the events in the study. So, now the transformed dataset found in the output of the function is a tibble object with as many data frames as events there are in the REDCap project: +```{r} +forms_data <- covican |> + rd_factor() |> + rd_checkbox() |> + rd_split(by = "form") -```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE} -dataset$data +forms_data$data ``` -The column `df` of the nested dataframe is a list containing the data corresponding to each event. Also the variables of the forms that are found in each event are reported in the column `vars`. - -> Note: If the REDCap project is longitudinal and the event-form is not specified, this transformation is not posible. - -#### *Data transformation and classification by form* - -Another option is to split the data by the forms found in the REDCap project. We will use also the `final_format` argument to specify that we want to split data by form: +> Note: For longitudinal projects, the column events shows the number of events in each form. -```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - final_format = "by_form") +- **By event** -#To print the results -dataset$results -``` +For longitudinal projects, you can also split the data by event. The function uses the `event_form` mapping to assign variables correctly to each event: -As before, a final step in the transformation has been added, which is to split the data according to the forms in the study. Thus, the transformed dataset will now be a tibble object with as many data frames as forms there are in the REDCap project: +```{r} +events_data <- covican |> + rd_factor() |> + rd_checkbox() |> + rd_split(by = "event") -```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE} -dataset$data +events_data$data ``` -> Note: If the REDCap project is longitudinal and the event-form is not specified, this transformation is not posible. - -#### *Additional arguments* - -There are other arguments which can be used to customize some of the transformation steps that the function performs by default: - -
+If you want to extract only one form or event, use the `which` argument: -checkbox_labels: specifies the name of the categories for the checkbox variables. Default is 'No/Yes', but we can change it to 'N/Y': +```{r} +# Example by form +baseline_data <- covican |> + rd_factor() |> + rd_checkbox() |> + rd_split(by = "form", which = "demographics") -```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - checkbox_labels = c("N", "Y")) +head(baseline_data$data) ```
-checkbox_na: logical argument involved in the transformation of checkbox variables. For checkbox variables that have a branching logic specified, when the logic is missing the values of the checkbox will be always converted to missing. Additionally, if this argument is true then also when the branching logic isn't satisfied their values will be converted to missing. +### **rd_insert_na** + +This is a bonus function that can be used to set some values of a variable to missing if a certain logic is fulfilled. It can be used, for example, to insert missings on those checkboxes that do not have a branching logic, as mentioned earlier. For instance, we can transform the checkboxes with the `rd_checkbox()` function and then use this function to set the values of the checkbox _type_underlying_disease_haematological_cancer_ to missing when the age is less than 65 years old: ```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - checkbox_na = TRUE) -``` +cb <- covican |> + rd_checkbox() -
+#Before inserting missings +table(cb$data$type_underlying_disease_haematological_cancer) -exclude_recalc: specifies the name of the variables that we do not want to be recalculated. For example, if we do not want to recalculate the variable _age_: +#Run with this function +cb2 <- covican |> + rd_checkbox() |> + rd_insert_na(vars = "type_underlying_disease_haematological_cancer", + filter = "age < 65") -```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - exclude_recalc = "age") +#After inserting missings +table(cb2$data$type_underlying_disease_haematological_cancer) ``` +_TODO: No funciona..._ -This argument is useful to reduce the time of execution of the function. For calculated fields with complex logic involving variables in different events the recalculation operation may be time consuming, so we can prevent the function to recalculate them with this argument. +> Recall that both the variable to be transformed (_age_) and the variable included in the filter (_type_underlying_disease_haematological_cancer_) are in the same event. In the contrary, if the variable to be transformed and the filter didn't have any event in common then the transformation would give an error. Furthermore, if the variable to be transformed was in more events than the filter, only the rows of the events in common would be converted.
-exclude_to_factor: specifies the name of the variables that we do not want to transform into a factor. For example, if we want the variable _dm_ to keep its original numeric version: +### **rd_rlogic** -```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - exclude_to_factor = "dm") -``` +This is also a bonus function that transforms the REDCap logic into logic that can be evaluated in R. It returns both the transformed logic and the result of the evaluation of the logic in R. -
+> This function only returns the transformed logic, so it has to be used outside the transform workflow. -delete_vars: every variable specified in this argument will be removed from the dataset. For example, we can change the argument to remove the date of birth variable from the dataset: +Let's see how it transforms the logic of one of the calculated fields in the built-in dataset: ```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - delete_vars = "d_birth") -``` +logic_trans <- covican |> + rd_rlogic(logic = "if([exc_1]='1' or [inc_1]='0' or [inc_2]='0' or [inc_3]='0',1,0)", + var = "screening_fail_crit") -
- -delete_pattern: every variable containing the strings specified in this argument will be removed from the dataset. By default, the value of `delete_pattern` is '\_complete'. For example, we can change the argument to remove the inclusion and exclusion criteria variables from the dataset (variables that contain 'inc\_' and 'exc\_' in their names): - -```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - delete_pattern = c("inc_", "exc_")) +str(logic_trans) ```
-which_event: in the transformation by event explained earlier, we can specify whether we want to keep only one out of all the events in the dataset. For example, if we only want to keep the baseline visit: - -```{r message=FALSE, warning=FALSE, comment=NA} -dataset <- rd_transform(covican, - final_format = "by_event", - which_event = "baseline_visit_arm_1") -``` - -
+### **rd_transform** -which_form: in the transformation by form explained earlier, we can specify whether we want to keep only one of the forms. For example, if we only want to keep the demographic form: +Alternatively, you can do all these steps at once using the `rd_transform()` function: ```{r message=FALSE, warning=FALSE, comment=NA} +covican_transformed <- rd_transform(covican) -dataset <- rd_transform(covican, - final_format = "by_form", - which_form = "demographics") - -data <- dataset$data - -names(data) -``` - -
- -wide: in the transformation by form, we can specify that we want each of the split datasets to be in a wide format. This is useful if the form appears in more than one event (or in a repeated event). Then, we will only have one row per patient and all the variables of the form will be in columns repeated by each event in the order that the events appear in REDCap. For example, if we want to keep only the laboratory findings in a wide format we can do: - -```{r message=FALSE, warning=FALSE, comment="#>", collapse = TRUE} -dataset <- rd_transform(covican, - final_format = "by_form", - which_form = "laboratory_findings", - wide = TRUE) - -head(dataset$data) +#Print the results of the transformation +covican_transformed$results ``` -
+_TODO: Vigila que te sale otra vez un 1._ +Using the arguments of the function we can perform all the different type of transformations described until now. ## **Queries**