-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revision of function documentation #36
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomaszaba thanks for your hard work on this. You are really putting in so much work over a short period of time. I worry a little bit that this might be impacting on attention to detail and quality from a documentation perspective.
I only got through the age.R
function scripts as there are repeating patterns of how you write the technical documentation that I think is not what is considered best practice. You are writing the technical documentation as if the audience knows nothing at all of how a function works and doesn't know how to code. For someone to get to this stage where they are reading your help page, they would at the very least know how to install your package and know how to invoke the help (?) command/function. If they don't know how to do that, then they don't know R and they won't be able to get to your documentation. The use of function documentation is mainly to get help within R for use of a function. Rarely will an R user go to the package website to read the documentation for a function. They will go to the website for README and vignettes.
From that perspective, you need to write your function documentation knowing that your reader is an R user who has basic knowledge of how to use a function.
I think this general comment and the specific comment for the documentation in the age functions can give you an idea of what most of my comments will be for the other function documentation. If you can have a look at my feedback and just review your other function documentation with this in mind that would be great.
I also think that a doing this process in smaller chunks is more productive and sustainable than doing one whole chunk of 74 files being changed and then asking for a review. If we go through a set of files to edit and then review, you'll get a good sense of the style that we can agree on and then from there, you will be able to apply those changes to the rest.
What do you think?
R/age.R
Outdated
#' @returns A numeric vector, of the same length as the input variable, containing | ||
#' age values in days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest instead to use this:
@returns A numeric vector of same length as `x` of age in days.
A related question is whether age in days is numeric or integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now I have age in days with two decimal places - reason why I went for numeric
. I did this to be consistent with months, which also takes 2 decimal places.
R/age.R
Outdated
#' `compute_age_in_months()` calculates age in months from on the basis of | ||
#' difference between the data collection date and the child's date of birth. | ||
#' It works inside [dplyr::mutate()] or [base::transform()]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid starting the description statement with the function name. For example, instead of what you have here, you can say:
Calculate child's age in months based on child's date of birth and the date of data collection.
I don't see why we need to say that this function works inside dplyr::mutate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if you call it outside mutating function it will not work. It's just a heads up that I thought I should put there for non-advanced R users.
R/age.R
Outdated
#' @param surv_date A vector of class "Date" holding values corresponding to | ||
#' the date of data collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is better?
@param surv_date A vector of class Date
for date of data collection.
R/age.R
Outdated
#' @param birth_date A vector of class "Date" holding values corresponding to | ||
#' the child's date of birth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a similar structure as my suggestion above:
@param birth_date A vector of class Date
for child's birth date
R/age.R
Outdated
#' @returns A numeric vector named `age` holding age values in months with two | ||
#' decimal places. Any value outside the range of 6.0 to 59.99 is replaced with | ||
#' `NA`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several comments here.
-
This is not really a documentation issue but more of the function but is related to documentation because you had to explain it here. Why should a function that is meant to calculate age in months have as one of its features that it removes ages outside a specific range? I know why you are doing that but I am not sure this is the function that should be doing that filtering. It doesn't make sense.
-
Recommend changing the text to something like this:
@returns A numeric vector for child's age in months
- Why the need to round off in this function? More so that this is not even an exported function? What value is there to round off in this function? Why not do this somewhere else -
process_age()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this review. Will work on them.
R/age.R
Outdated
#' @param svdate A vector of class "Date" holding values corresponding to | ||
#' the data collection date. Default is `NULL`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- why call this argument
svdate
and notsurv_date
as you used in the earlier function? why do these parameters be different? - You say here that
svdate
is a vector of class Date but what they need to supply here is a character value for the name of the variable in df for survey date? Am I correct or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that I noticed during my revision, but I did not address because it was not the right time; changing that would mean having to touch many files in this branch, including function definitions and tests. I took note of this and will revise. Actually, this is where I want to refactor the function so that those parameters can be called needless to use " "
. This is because even me I make mistakes when I call this function. Because I use data masking approach almost everywhere, when I call that function, I tend to pass those parameters without " "
and it errors.
Thanks for raising this issue.
R/age.R
Outdated
#' @param birdate A vector of class "Date" holding values corresponding to | ||
#' the child's date of birth. Default is `NULL`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as svdate
- why different name from earlier function and isn't what this needs is a character value for the name of variable in df for date of birth?
R/age.R
Outdated
#' @returns A data frame of the same length as the input with an additional | ||
#' column. A new variable, `age_day`, is added to the output data frame whilst | ||
#' the `age` variable gets filled where applicable, and then any values outside | ||
#' the range of 6.0 to 59.99 months get replaced with `NA`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rewrite this documentation to:
@returns A data.frame based on df
. Variable age
that is required to be included in df
will be filled where applicable with the age in months for each row of data in df
. A new variable for df
named age_days
will be created. Values for age
and age_days
for children less than 6 months old and 60 months or more will be set to NA.
R/age.R
Outdated
#' @param age A numeric vector holding age values in months, usually estimated | ||
#' using local event calendars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really numeric or should this be integer? If this is from local event calendars then this should be integer right?
This is not like the other arguments where you require the quoted variable name, here you use the unqouted variable name. Why the difference? If I were a user and I am faced with this syntax, I will get really frustrated as it doesn't make sense to me why the other parameters have to be quoted while another needs to be unquoted. I Know you like the whole complicated data masking approaches but I think you have to be consistent here. choose quoted or unqouted for all parameters and not mixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I did mention this frustration I experienced myself. I will refactor this and yes, to data masking thing :-) it keeps code clean, I find.
#' # Have a sample data ---- | ||
#' ## A sample data ---- | ||
#' df <- data.frame( | ||
#' survy_date = as.Date(c( | ||
#' "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01", "2023-01-01")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need to make your own data.frame as an example? Can you not use a dataset and include in the package and use it for examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took note of this. Will revise accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember why I had to create my own dataset. My dataset does not have child's date of birth to demonstrate this function properly. In anthro.01
there is only the date of data collection, whilst child's age is all filled with NA
. The data came as it is.
@ernestguevarra , I appreciate your comprehensive feedback. I do admit to have followed that approach. This was the guidance I got from Hadley's book, including the mentioning of the package function at the beginning of Regarding to your last point about chunks of changes in a branch, I do agree but I would like you to propose the strategy to operationalise. Do you suggest changing one What about the coherence in the sentences. Has this improved in at least the file you reviewed? Thanks. |
Hi @ernestguevarra,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will approve these changes now but I want to re-iterate the following:
-
"Wrangling child's age" in your function title sounds really funny and bordering on inappropriate....
-
why different parameter names for different functions for the same inputs? svdate in one and then surv_date in another....
-
your argument definition for
surv_date
andbirdate
is incorrect. What you are asking the input of for that is not the vector itself but the variable name indf
that holds the survey date and the birth date. I mentioned this in my previous review and I think this is something that is not about preference but about what is correct. So, I think this should be "A character value for the name of the variable in df containing a vector of values of class Date for the survey date...". There is a big difference between this and your definition. If I were to follow your definition, then I will apply the following in the function:
process_age(input = df, svdate = df$survey_date, ...)
which is not what you want...
- Again, I know you might be thinking this is not important but for a user, it is confusing. Please make the specification of
age
argument in process_age consistent with how the dates are specified. Either have the user declare the variable for age (as is with the dates) are use an unquoted variable for svdate and birdate. I have been using R for close to 20 years now and have read so much documentation. If I read yours for this function, I will make mistakes in specifying dates and age because of how these are done in different ways...We have to be consistent. I mentioned this in my comment and I think you disregarded it. I am not sure why but this along with number 3 above I think is a critical comment that needs to have an action either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ernest,
- I decided to change to "wrangle age" for consistency. I thought it would not be right to use the verb to wrangle here and to process there. If that is Ok for you I can put it back to using the verb to process on age.
- The issue around different parameters, which you had raised in your previous reviews, are well noted and listed for action in the next brach where I will be re-factoring the functions. If I had revised those parameters that would have entailed to change function definitions, test files. I have listed all these issues, and others that I noticed, to address in the branch about function definitions. I thought this was the right approach to follow. I should have mentioned this in my message earlier today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomaszaba happy to approve this now. You've done really hard work on good job on this.
I think we need to get your hands and brains working on re-factor and actual coding rather than just docs at this stage. I think reviewing the docs helped you see what kind of re-factor you need. I am very confident that you have that mindset now.
So, it's important we let you do code now! which is always the fun stuff!!!!
👍🏾 🎉
Hi Ernest,
In this PR I addressed the issues #17, #18 , #19.
Please review.