Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision of function documentation #36

Merged
merged 9 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@
^pkgdown$
^doc$
^Meta$
^data-raw$
8 changes: 0 additions & 8 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,20 +1,12 @@
# Generated by roxygen2: do not edit by hand

export(age_ratio_test)
export(assign_penalty_points_age_sex_ratio)
export(assign_penalty_points_flags_and_sd)
export(assign_penalty_points_skew_kurt)
export(check_plausibility_mfaz)
export(check_plausibility_muac)
export(check_plausibility_wfhz)
export(check_sample_size)
export(classify_age_sex_ratio)
export(classify_overall_quality)
export(classify_percent_flagged)
export(classify_sd)
export(classify_skew_kurt)
export(compute_combined_prevalence)
export(compute_mfaz_prevalence)
export(compute_muac_prevalence)
export(compute_quality_score)
export(compute_wfhz_prevalence)
Expand Down
131 changes: 70 additions & 61 deletions R/age.R
Copy link
Member

@ernestguevarra ernestguevarra Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve these changes now but I want to re-iterate the following:

  1. "Wrangling child's age" in your function title sounds really funny and bordering on inappropriate....

  2. why different parameter names for different functions for the same inputs? svdate in one and then surv_date in another....

  3. your argument definition for surv_date and birdate is incorrect. What you are asking the input of for that is not the vector itself but the variable name in df that holds the survey date and the birth date. I mentioned this in my previous review and I think this is something that is not about preference but about what is correct. So, I think this should be "A character value for the name of the variable in df containing a vector of values of class Date for the survey date...". There is a big difference between this and your definition. If I were to follow your definition, then I will apply the following in the function:

process_age(input = df, svdate = df$survey_date, ...)

which is not what you want...

  1. Again, I know you might be thinking this is not important but for a user, it is confusing. Please make the specification of age argument in process_age consistent with how the dates are specified. Either have the user declare the variable for age (as is with the dates) are use an unquoted variable for svdate and birdate. I have been using R for close to 20 years now and have read so much documentation. If I read yours for this function, I will make mistakes in specifying dates and age because of how these are done in different ways...We have to be consistent. I mentioned this in my comment and I think you disregarded it. I am not sure why but this along with number 3 above I think is a critical comment that needs to have an action either way.

Copy link
Collaborator Author

@tomaszaba tomaszaba Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ernest,

  1. I decided to change to "wrangle age" for consistency. I thought it would not be right to use the verb to wrangle here and to process there. If that is Ok for you I can put it back to using the verb to process on age.
  2. The issue around different parameters, which you had raised in your previous reviews, are well noted and listed for action in the next brach where I will be re-factoring the functions. If I had revised those parameters that would have entailed to change function definitions, test files. I have listed all these issues, and others that I noticed, to address in the branch about function definitions. I thought this was the right approach to follow. I should have mentioned this in my message earlier today.

Original file line number Diff line number Diff line change
@@ -1,27 +1,30 @@
#'
#' Recode age variable from months to days
#' Calculate child's age in days
#'
#' @param x A numeric vector containing values of age in months.
#' @param x A double vector of child's age in months.
#'
#' @returns A double vector of the same length as `x` of age in days.
#'
#' @returns A numeric vector with values corresponding to age in days
#'
compute_month_to_days <- function(x) {
x * (365.25 / 12)
}




#'
#' Calculate child's age in months
#'
#' Get age in months from birth-date and the data when data was collected.
#' @description
#' Calculate child's age in months based on date of birth and the data collection date.
#'
#' `compute_age_in_months()` works inside [dplyr::mutate()] or [base::transform()]
#' It helps you to compute age in months from a pair of birth date and survey date.
#' @param surv_date A vector of class `Date` for data collection date.
#'
#' @param surv_date,birth_date Vectors containing dates. `surv_date` refers to the day,
#' month and year when the data was collected; while `birth_date` refers to the date
#' when the child was born.
#' @param birth_date A vector of class `Date` for child's date of birth.
#'
#' @returns A vector of name `age` storing age in months, a mix of double and
#' integer and `NA` for missing value if any of the processed age in months is
#' < 6 or > 59.99 months.
#' @returns A vector of class `double` for child's age in months with two decimal places.
#' Any value less than 6.0 and greater than or equal to 60.0 months will be set to `NA`.
#'
#'
compute_age_in_months <- function (surv_date, birth_date) {
Expand All @@ -31,34 +34,38 @@ compute_age_in_months <- function (surv_date, birth_date) {
age_mo <- ifelse(age_mo < 6.0 | age_mo >= 60.0, NA, age_mo)
}




#'
#' Transform age in months and age in days with a data frame
#' Wrangle child's age
#'
#' @description
#' Wrangle child's age for downstream analysis. This includes calculating age
#' in months based on the date of data collection and child's date of birth and
#' setting to `NA` the age values that are less than 6.0 and greater than or equal
#' to 60.0 months old.
#'
#' `process_age()` helps you get the variable age in the right format and ready
#' to be used for downstream workflow, i.e., get z-scores, as well as exclude
#' age values that are out-of-range.
#' @param df A dataset of class `data.frame` to process age from.
#'
#' @param df The input data frame.
#' @param svdate A vector of class `Date` for date of data collection.
#' Default is `NULL`.
#'
#' @param svdate,birdate Vectors containing dates. `svdate` refers to the day, month
#' and year when the data was collected; while `birdate` refers to the date when the
#' child was born (birth-date). By default, both arguments are `NULL`. This is
#' makes `process_age()` work even in data sets where either survey date or birth-
#' data is not available, so the `process_age()` works on already given age variable.
#' @param birdate A vector of class `Date` for child's date of birth.
#' Default is `NULL`.
#'
#' @param age A numeric vector containing already given age in months, usually an
#' integer in the input data as it is estimated using local event calendars.
#' `age` will typically be available on a particular row when `birth_date` of
#' that same row is missing.
#' @param age A vector of class `integer` of age in months, usually estimated
#' using local event calendars.
#'
#' @returns A data frame of the same length as the input data frame, but of a
#' different width. If `svdate` or `birdate` are available, two new vectors are added
#' to the data frame: `age` in months with two decimal places and `age_day` which
#' is age in days with decimal two decimal places.
#' @returns A `data.frame` based on `df`. The variable `age` that is required to be
#' included in `df` will be filled where applicable with the age in months for
#' each row of data in `df`. A new variable for `df` named `age_days` will be
#' created. Values for `age` and `age_days` for children less than 6.0 and greater
#' than or equal to 60.0 months old will be set to `NA`.
#'
#' @examples
#'
#' # Have a sample data ----
#' ## A sample data ----
#' df <- data.frame(
#' survy_date = as.Date(c(
#' "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01")),
Comment on lines -61 to 71
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to make your own data.frame as an example? Can you not use a dataset and include in the package and use it for examples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took note of this. Will revise accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember why I had to create my own dataset. My dataset does not have child's date of birth to demonstrate this function properly. In anthro.01 there is only the date of data collection, whilst child's age is all filled with NA. The data came as it is.

Expand All @@ -67,9 +74,13 @@ compute_age_in_months <- function (surv_date, birth_date) {
#' age = c(NA, 36, NA, NA, NA)
#' )
#'
#' ## Apply function ----
#' ## Apply the function ----
#' df |>
#' process_age(svdate = "survy_date", birdate = "birthdate", age = age)
#' process_age(
#' svdate = "survy_date",
#' birdate = "birthdate",
#' age = age
#' )
#'
#' @export
#'
Expand All @@ -95,42 +106,40 @@ process_age <- function(df, svdate = NULL, birdate = NULL, age) {
tibble::as_tibble(df)
}



#'
#' Age ratio test on children aged 6:23 over 24:59 months
#' Test for statistical difference between the proportion of children aged 24 to
#' 59 months old over those aged 6 to 23 months old
#'
#' @description
#' As documented in [nipnTK::ageRatioTest()], age ratio test is an age-related
#' test of survey data quality. This includes other assessments as screenings,
#' sentinel sites, etc. Different to [nipnTK::ageRatioTest()], in `age_ratio_test()`
#' the ratio of children is calculate from children 6-23 months to the number of
#' children age 24-59 months. The ratio is then compared to the expected ratio
#' (set at 0.66). Then the difference between the observed ratio is compared to
#' the expected using a Chi-squared test.
#'
#' `age_ratio_test()` should only be used for MUAC checks. This particularly
#' useful as allows you to determine if downstream your analysis you should
#' consider adjusting your MUAC prevalence, should there be more younger children
#' than older children in your survey, screening or sentinel site data. If you
#' wish to get the age ratio for children 6-29/30-59 like in SMART Methodology,
#' then you should use [nipnTK::ageRatioTest()] NOT `age_ratio_test()`.
#'
#' @param age A vector storing values about child's age in months.
#'
#' @param .expectedP The expected proportion of children aged 24-59 months over
#' children aged 6-29 months, considered to be of 0.66 according to the
#' Calculate the observed age ratio of children aged 24 to 59 months old over
#' those aged 6 to 23 months old and test if there is a statistical difference
#' between the observed and the expected.
#'
#' @param age A double vector of age in months.
#'
#' @param .expectedP The expected proportion of children aged 24 to 59 months
#' old over those aged 6 to 23 months old. This is estimated to be 0.66 as in the
#' [SMART MUAC tool](https://smartmethodology.org/survey-planning-tools/updated-muac-tool/).
#'
#' @returns A list three statistics: `p` for p-value, `observedR` for observed ratio
#' from your data, `observedP` for observed proportion of children 24-59 months
#' over the universe of your sample data.
#' @returns A vector of class `list` of three statistics: `p` for p-value of the
#' statistical difference between the observed and the expected proportion of
#' children aged 24 to 59 months old over those aged 6 to 23 months old;
#' `observedR` and `observedP` for the observed ratio and proportion respectively.
#'
#' @examples
#' @details
#' This function should be used specifically for assessing MUAC data. For
#' age ratio tests of children aged 6 to 29 months old over 30 to 59 months old, as
#' performed in the SMART plausibility check, use [nipnTK::ageRatioTest()] instead.
#'
#' ## Have a sample data ----
#' age <- seq(6,59) |> sample(300, replace = TRUE)
#' @examples
#'
#' ## Apply the function ----
#' age_ratio_test(age, .expectedP = 0.66)
#' ## An example of application using `anthro.02` dataset ----
#' age_ratio_test(
#' age = anthro.02$age,
#' .expectedP = 0.66
#' )
#'
#' @export
#'
Expand Down
99 changes: 42 additions & 57 deletions R/case_definitions.R
Original file line number Diff line number Diff line change
@@ -1,24 +1,31 @@
#'
#' Case-Definition: is an observation acutely malnourished?
#' Define wasting based on WFHZ, MFAZ, MUAC and Combined criteria
#'
#' [define_wasting_cases_muac()], [define_wasting_cases_whz()] and
#' [define_wasting_cases_combined()] help you get through with your wasting
#' case-definition for each observation. It should be used inside dplyr::mutate()
#' or base::transform(). It was designed to be used inside [define_wasting()].
#' @description
#' Define if a given observation in the dataset is wasted or not, on the basis of
#' WFHZ, MFAZ, MUAC and the combined criteria.
#'
#' @param df A dataset object of class `data.frame` to use.
#'
#' @param muac A vector of class `integer` of MUAC values in millimeters.
#'
#' @param zscore A vector of class `double` of WFHZ values (with 3 decimal places).
#'
#' @param edema A vector of class `character` of edema. Code should be
#' "y" for presence and "n" for absence of bilateral edema. Default is `NULL`.
#'
#' @param cases A choice of the form of wasting to be defined.
#'
#' @param muac An integer vector containing MUAC measurements in mm.
#' @param zscore A double vector containing weight-for-height zscores with 3
#' decimal places.
#' @param edema A character vector of "y" = Yes, "n" = No bilateral edema.
#' Default is NULL.
#' @param cases A choice of wasting case definition you wish to apply. For combined
#' acute malnutrition with [define_wasting_cases_combined()] cases options are:
#' c("cgam", "csam", "cmam").
#' @param base A choice of the criterion on which the case-definition should be based.
#'
#' @returns A numeric vector of the same size as the input vector, with values ranging
#' between 1=Yes and 0=No.
#' @returns A vector of class `numeric` of dummy values: 1 for case and 0
#' for not case.
#'
#' @details
#' Use `define_wasting()` to add the case-definitions to data frame.
#'
#' @rdname case_definition
#'
#' @rdname case_definitions
#'
define_wasting_cases_muac <- function(muac, edema = NULL,
cases = c("gam", "sam", "mam")) {
Expand Down Expand Up @@ -46,7 +53,7 @@ define_wasting_cases_muac <- function(muac, edema = NULL,

#'
#'
#' @rdname case_definitions
#' @rdname case_definition
#'
#'
define_wasting_cases_whz <- function(zscore, edema = NULL,
Expand Down Expand Up @@ -75,7 +82,7 @@ define_wasting_cases_whz <- function(zscore, edema = NULL,

#'
#'
#' @rdname case_definitions
#' @rdname case_definition
#'
#'
define_wasting_cases_combined <- function(zscore, muac, edema = NULL,
Expand Down Expand Up @@ -104,45 +111,28 @@ define_wasting_cases_combined <- function(zscore, muac, edema = NULL,
}


# Function to add new vectors with case definitions ----------------------------
#'
#' Add acute malnutrition case-definitions to the data frame
#'
#' Use `define_wasting()` to add the case-definitions in your input data frame.
#'
#' @param df The data frame object containing the vectors with zscores, muac and
#' edema.
#' @param zscore The vector storing zscores values with 3 decimal places.
#' @param muac An integer vector containing MUAC measurements in mm.
#' @param edema A character vector of "y" = Yes, "n" = No bilateral edema.
#' Default is NULL.
#' @param base A choice of options to which your case definition should be based on.
#'
#' @returns A data frame with three vectors added to the input data frame: "gam",
#' "sam" and "mam". If base = "combined" the vector names change to "cgam",
#' "csam" and "cmam" for combined global, severe and moderate acute malnutrition
#' respectively.
#'
#' @examples
#' # MUAC-based case-definition ----
#'
#' ## Weight-for-height based case-definition ----
#' x <- anthro.02 |>
#' define_wasting(
#' muac = muac,
#' zscore = wfhz,
#' edema = edema,
#' base = "muac"
#' base = "wfhz"
#' )
#' head(x)
#'
#' # Weight-for-height based case-definition ----
#' ## MUAC-based case-definition ----
#' x <- anthro.02 |>
#' define_wasting(
#' zscore = wfhz,
#' muac = muac,
#' edema = edema,
#' base = "wfhz"
#' base = "muac"
#' )
#' head(x)
#'
#' # Combined case-definition ----
#' ## Combined case-definition ----
#' x <- anthro.02 |>
#' define_wasting(
#' zscore = wfhz,
Expand All @@ -152,6 +142,8 @@ define_wasting_cases_combined <- function(zscore, muac, edema = NULL,
#' )
#' head(x)
#'
#' @rdname case_definition
#'
#' @export
#'
define_wasting <- function(df, zscore = NULL, muac = NULL, edema = NULL,
Expand Down Expand Up @@ -231,23 +223,16 @@ define_wasting <- function(df, zscore = NULL, muac = NULL, edema = NULL,
}

#'
#' A helper function to classify nutritional status into SAM, MAM or not wasted
#'
#' @description
#' `classify_wasting_for_cdc_approach()` is used a helper inside
#' [apply_cdc_age_weighting()] to classify nutritional status into "sam", "mam"
#' or "not wasted" and then the vector returned is used downstream to calculate
#' the proportions of children with severe and moderate acute malnutrition.
#' Classify wasting into severe or moderate wasting to be used in the
#' SMART MUAC tool weighting approach
#'
#' @param muac An integer vector containing MUAC values. They should be in
#' millimeters.
#' @param muac A vector of class `integer` of MUAC values in millimeters.
#'
#' @param .edema Optional. Its a vector containing data on bilateral pitting
#' edema coded as "y" for yes and "n" for no.
#' @param .edema A vector of class `character` of edema. Code should be
#' "y" for presence and "n" for absence of bilateral edema. Default is `NULL`.
#'
#' @returns A numeric vector of the same size as the input vector with values ranging
#' between "sam", "mam" and "not wasted" for severe, moderate acute malnutrition and not
#' acutely malnourished, respectively.
#' @returns A vector of class `character` of the same length as `muac` and `.edema`
#' indicating if a child is severe or moderately wasted or not wasted.
#'
#'
classify_wasting_for_cdc_approach <- function(muac, .edema = NULL) {
Expand Down
Loading
Loading