Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision of function documentation #36

Merged
merged 9 commits into from
Oct 18, 2024
Merged

Revision of function documentation #36

merged 9 commits into from
Oct 18, 2024

Conversation

tomaszaba
Copy link
Collaborator

Hi Ernest,

In this PR I addressed the issues #17, #18 , #19.

Please review.

@tomaszaba tomaszaba added the documentation Improvements or additions to documentation label Oct 12, 2024
@tomaszaba tomaszaba changed the title Revision of the functions documentation Revision of function documentation Oct 12, 2024
Copy link
Member

@ernestguevarra ernestguevarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomaszaba thanks for your hard work on this. You are really putting in so much work over a short period of time. I worry a little bit that this might be impacting on attention to detail and quality from a documentation perspective.

I only got through the age.R function scripts as there are repeating patterns of how you write the technical documentation that I think is not what is considered best practice. You are writing the technical documentation as if the audience knows nothing at all of how a function works and doesn't know how to code. For someone to get to this stage where they are reading your help page, they would at the very least know how to install your package and know how to invoke the help (?) command/function. If they don't know how to do that, then they don't know R and they won't be able to get to your documentation. The use of function documentation is mainly to get help within R for use of a function. Rarely will an R user go to the package website to read the documentation for a function. They will go to the website for README and vignettes.

From that perspective, you need to write your function documentation knowing that your reader is an R user who has basic knowledge of how to use a function.

I think this general comment and the specific comment for the documentation in the age functions can give you an idea of what most of my comments will be for the other function documentation. If you can have a look at my feedback and just review your other function documentation with this in mind that would be great.

I also think that a doing this process in smaller chunks is more productive and sustainable than doing one whole chunk of 74 files being changed and then asking for a review. If we go through a set of files to edit and then review, you'll get a good sense of the style that we can agree on and then from there, you will be able to apply those changes to the rest.

What do you think?

R/age.R Outdated
Comment on lines 6 to 7
#' @returns A numeric vector, of the same length as the input variable, containing
#' age values in days.
Copy link
Member

@ernestguevarra ernestguevarra Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest instead to use this:

@returns A numeric vector of same length as `x` of age in days.

A related question is whether age in days is numeric or integer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now I have age in days with two decimal places - reason why I went for numeric. I did this to be consistent with months, which also takes 2 decimal places.

R/age.R Outdated
Comment on lines 18 to 20
#' `compute_age_in_months()` calculates age in months from on the basis of
#' difference between the data collection date and the child's date of birth.
#' It works inside [dplyr::mutate()] or [base::transform()].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid starting the description statement with the function name. For example, instead of what you have here, you can say:

Calculate child's age in months based on child's date of birth and the date of data collection.

I don't see why we need to say that this function works inside dplyr::mutate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if you call it outside mutating function it will not work. It's just a heads up that I thought I should put there for non-advanced R users.

R/age.R Outdated
Comment on lines 22 to 23
#' @param surv_date A vector of class "Date" holding values corresponding to
#' the date of data collection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is better?

@param surv_date A vector of class Date for date of data collection.

R/age.R Outdated
Comment on lines 25 to 26
#' @param birth_date A vector of class "Date" holding values corresponding to
#' the child's date of birth.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a similar structure as my suggestion above:

@param birth_date A vector of class Date for child's birth date

R/age.R Outdated
Comment on lines 28 to 30
#' @returns A numeric vector named `age` holding age values in months with two
#' decimal places. Any value outside the range of 6.0 to 59.99 is replaced with
#' `NA`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments here.

  1. This is not really a documentation issue but more of the function but is related to documentation because you had to explain it here. Why should a function that is meant to calculate age in months have as one of its features that it removes ages outside a specific range? I know why you are doing that but I am not sure this is the function that should be doing that filtering. It doesn't make sense.

  2. Recommend changing the text to something like this:

@returns A numeric vector for child's age in months

  1. Why the need to round off in this function? More so that this is not even an exported function? What value is there to round off in this function? Why not do this somewhere else - process_age()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this review. Will work on them.

R/age.R Outdated
Comment on lines 51 to 52
#' @param svdate A vector of class "Date" holding values corresponding to
#' the data collection date. Default is `NULL`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. why call this argument svdate and not surv_date as you used in the earlier function? why do these parameters be different?
  2. You say here that svdate is a vector of class Date but what they need to supply here is a character value for the name of the variable in df for survey date? Am I correct or am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that I noticed during my revision, but I did not address because it was not the right time; changing that would mean having to touch many files in this branch, including function definitions and tests. I took note of this and will revise. Actually, this is where I want to refactor the function so that those parameters can be called needless to use " ". This is because even me I make mistakes when I call this function. Because I use data masking approach almost everywhere, when I call that function, I tend to pass those parameters without " " and it errors.

Thanks for raising this issue.

R/age.R Outdated
Comment on lines 54 to 55
#' @param birdate A vector of class "Date" holding values corresponding to
#' the child's date of birth. Default is `NULL`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as svdate - why different name from earlier function and isn't what this needs is a character value for the name of variable in df for date of birth?

R/age.R Outdated
Comment on lines 60 to 63
#' @returns A data frame of the same length as the input with an additional
#' column. A new variable, `age_day`, is added to the output data frame whilst
#' the `age` variable gets filled where applicable, and then any values outside
#' the range of 6.0 to 59.99 months get replaced with `NA`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rewrite this documentation to:

@returns A data.frame based on df. Variable age that is required to be included in df will be filled where applicable with the age in months for each row of data in df. A new variable for df named age_days will be created. Values for age and age_days for children less than 6 months old and 60 months or more will be set to NA.

R/age.R Outdated
Comment on lines 57 to 58
#' @param age A numeric vector holding age values in months, usually estimated
#' using local event calendars.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really numeric or should this be integer? If this is from local event calendars then this should be integer right?

This is not like the other arguments where you require the quoted variable name, here you use the unqouted variable name. Why the difference? If I were a user and I am faced with this syntax, I will get really frustrated as it doesn't make sense to me why the other parameters have to be quoted while another needs to be unquoted. I Know you like the whole complicated data masking approaches but I think you have to be consistent here. choose quoted or unqouted for all parameters and not mixed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I did mention this frustration I experienced myself. I will refactor this and yes, to data masking thing :-) it keeps code clean, I find.

Comment on lines -61 to 70
#' # Have a sample data ----
#' ## A sample data ----
#' df <- data.frame(
#' survy_date = as.Date(c(
#' "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to make your own data.frame as an example? Can you not use a dataset and include in the package and use it for examples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took note of this. Will revise accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember why I had to create my own dataset. My dataset does not have child's date of birth to demonstrate this function properly. In anthro.01 there is only the date of data collection, whilst child's age is all filled with NA. The data came as it is.

@tomaszaba
Copy link
Collaborator Author

@tomaszaba thanks for your hard work on this. You are really putting in so much work over a short period of time. I worry a little bit that this might be impacting on attention to detail and quality from a documentation perspective.

I only got through the age.R function scripts as there are repeating patterns of how you write the technical documentation that I think is not what is considered best practice. You are writing the technical documentation as if the audience knows nothing at all of how a function works and doesn't know how to code. For someone to get to this stage where they are reading your help page, they would at the very least know how to install your package and know how to invoke the help (?) command/function. If they don't know how to do that, then they don't know R and they won't be able to get to your documentation. The use of function documentation is mainly to get help within R for use of a function. Rarely will an R user go to the package website to read the documentation for a function. They will go to the website for README and vignettes.

From that perspective, you need to write your function documentation knowing that your reader is an R user who has basic knowledge of how to use a function.

I think this general comment and the specific comment for the documentation in the age functions can give you an idea of what most of my comments will be for the other function documentation. If you can have a look at my feedback and just review your other function documentation with this in mind that would be great.

I also think that a doing this process in smaller chunks is more productive and sustainable than doing one whole chunk of 74 files being changed and then asking for a review. If we go through a set of files to edit and then review, you'll get a good sense of the style that we can agree on and then from there, you will be able to apply those changes to the rest.

What do you think?

@ernestguevarra , I appreciate your comprehensive feedback. I do admit to have followed that approach. This was the guidance I got from Hadley's book, including the mentioning of the package function at the beginning of @description. But at the same time I also used my approach, which I had in mind the non-advanced R users. When you are not an advanced user, which is majority of my audience I believe, writing function documentation in an advanced way can limit their ability to understand. This was my personal experience when started using R. I used to find some function documentation too difficult to grasp. So I decided to work around this in this package using simple language. If you still think we should avoid that styling, I can strive to rework on this to incorporate the styling you are proposing to follow. Please let me know.

Regarding to your last point about chunks of changes in a branch, I do agree but I would like you to propose the strategy to operationalise. Do you suggest changing one .R file then raise a PR?

What about the coherence in the sentences. Has this improved in at least the file you reviewed?

Thanks.

@tomaszaba
Copy link
Collaborator Author

@tomaszaba thanks for your hard work on this. You are really putting in so much work over a short period of time. I worry a little bit that this might be impacting on attention to detail and quality from a documentation perspective.
I only got through the age.R function scripts as there are repeating patterns of how you write the technical documentation that I think is not what is considered best practice. You are writing the technical documentation as if the audience knows nothing at all of how a function works and doesn't know how to code. For someone to get to this stage where they are reading your help page, they would at the very least know how to install your package and know how to invoke the help (?) command/function. If they don't know how to do that, then they don't know R and they won't be able to get to your documentation. The use of function documentation is mainly to get help within R for use of a function. Rarely will an R user go to the package website to read the documentation for a function. They will go to the website for README and vignettes.
From that perspective, you need to write your function documentation knowing that your reader is an R user who has basic knowledge of how to use a function.
I think this general comment and the specific comment for the documentation in the age functions can give you an idea of what most of my comments will be for the other function documentation. If you can have a look at my feedback and just review your other function documentation with this in mind that would be great.
I also think that a doing this process in smaller chunks is more productive and sustainable than doing one whole chunk of 74 files being changed and then asking for a review. If we go through a set of files to edit and then review, you'll get a good sense of the style that we can agree on and then from there, you will be able to apply those changes to the rest.
What do you think?

@ernestguevarra , I appreciate your comprehensive feedback. I do admit to have followed that approach. This was the guidance I got from Hadley's book, including the mentioning of the package function at the beginning of @description. But at the same time I also used my approach, which I had in mind the non-advanced R users. When you are not an advanced user, which is majority of my audience I believe, writing function documentation in an advanced way can limit their ability to understand. This was my personal experience when started using R. I used to find some function documentation too difficult to grasp. So I decided to work around this in this package using simple language. If you still think we should avoid that styling, I can strive to rework on this to incorporate the styling you are proposing to follow. Please let me know.

Regarding to your last point about chunks of changes in a branch, I do agree but I would like you to propose the strategy to operationalise. Do you suggest changing one .R file then raise a PR?

What about the coherence in the sentences. Has this improved in at least the file you reviewed?

Thanks.

Hi @ernestguevarra,
I have just pushed my revisions to this PR addressing your reviews. Please see below some notes for your consideration while reviewing this PR:

  • I opted to use the verb to wrangle over to process. I think I have done this throughout the documentation.

  • Regarding the class of vectors, as agreed, I considered being specific. However, in some instances, as in recode_muac() where inherent to the purpose of the function, I called it "A vector of class double or integer. This is because if the vector comes in cm, that would be double, otherwise integer. On this, I had to revise my sentences to "A vector of class double of …" instead of "A double vector …". This was to avoid the ambiguity around the word "double" that can be perceived as if we're referring to "duplicated or two times vector".

  • On the quality scores and classifiers, after I read (again) the SMART manual, I noticed that they use "acceptability" to refer to what I had initially called "quality". In this way, I revised the documentation to the following: "rate" in place of classify and "acceptability" for quality. Rate the acceptability of the standard deviation .

  • Something awkward is happing with the function in case_definition.R. Somehow the documentation checked with help(define_wasting_cases_whz) returns the argument in a different order that I set in the .R file. . Also, sometimes on the rendered documentation, the chunk on the usage get displayed differently for some functions. I am unaware of what leads to this. Do you know?

image

Copy link
Member

@ernestguevarra ernestguevarra Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will approve these changes now but I want to re-iterate the following:

  1. "Wrangling child's age" in your function title sounds really funny and bordering on inappropriate....

  2. why different parameter names for different functions for the same inputs? svdate in one and then surv_date in another....

  3. your argument definition for surv_date and birdate is incorrect. What you are asking the input of for that is not the vector itself but the variable name in df that holds the survey date and the birth date. I mentioned this in my previous review and I think this is something that is not about preference but about what is correct. So, I think this should be "A character value for the name of the variable in df containing a vector of values of class Date for the survey date...". There is a big difference between this and your definition. If I were to follow your definition, then I will apply the following in the function:

process_age(input = df, svdate = df$survey_date, ...)

which is not what you want...

  1. Again, I know you might be thinking this is not important but for a user, it is confusing. Please make the specification of age argument in process_age consistent with how the dates are specified. Either have the user declare the variable for age (as is with the dates) are use an unquoted variable for svdate and birdate. I have been using R for close to 20 years now and have read so much documentation. If I read yours for this function, I will make mistakes in specifying dates and age because of how these are done in different ways...We have to be consistent. I mentioned this in my comment and I think you disregarded it. I am not sure why but this along with number 3 above I think is a critical comment that needs to have an action either way.

Copy link
Collaborator Author

@tomaszaba tomaszaba Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ernest,

  1. I decided to change to "wrangle age" for consistency. I thought it would not be right to use the verb to wrangle here and to process there. If that is Ok for you I can put it back to using the verb to process on age.
  2. The issue around different parameters, which you had raised in your previous reviews, are well noted and listed for action in the next brach where I will be re-factoring the functions. If I had revised those parameters that would have entailed to change function definitions, test files. I have listed all these issues, and others that I noticed, to address in the branch about function definitions. I thought this was the right approach to follow. I should have mentioned this in my message earlier today.

Copy link
Member

@ernestguevarra ernestguevarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomaszaba happy to approve this now. You've done really hard work on good job on this.

I think we need to get your hands and brains working on re-factor and actual coding rather than just docs at this stage. I think reviewing the docs helped you see what kind of re-factor you need. I am very confident that you have that mindset now.

So, it's important we let you do code now! which is always the fun stuff!!!!

👍🏾 🎉

@ernestguevarra ernestguevarra merged commit 74837a1 into main Oct 18, 2024
7 checks passed
@tomaszaba tomaszaba deleted the function_doc branch November 3, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
2 participants