Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

behaviour of na_str in formatters 0.5.9 #321

Closed
iaugusty opened this issue Sep 27, 2024 · 10 comments
Closed

behaviour of na_str in formatters 0.5.9 #321

iaugusty opened this issue Sep 27, 2024 · 10 comments
Labels
discussion Needs to consider if worth it or how sme Tracks changes for the sme board

Comments

@iaugusty
Copy link
Collaborator

iaugusty commented Sep 27, 2024

Summary

Just checking if this is intended behavior of na_str, when na_str is a character string of length > 1

xx <- list(c(NA, 1, NA), c(NA, NA, 1), c(1, NA, NA), c(NA, NA, NA))

sapply(xx, FUN = function(x)format_value(x, format = "xx.x (xx.x - xx.x)", na_str = paste0("ne",1:10)))

"ne1 (1.0 - ne2)" "ne1 (ne2 - 1.0)" "1.0 (ne1 - ne2)" "ne1 (ne2 - ne3)"

I would find it more intuitive if the replacement of an NA value is according to the position of the na_str vector, rather than the position of the NA value in the input vector.

In the above example, I would expect the result to read as

"ne1 (1.0 - ne3)" "ne1 (ne2 - 1.0)" "1.0 (ne2 - ne3)" "ne1 (ne2 - ne3)"

library(formatters)
xx <- list(c(NA, 1, NA), c(NA, NA, 1), c(1, NA, NA), c(NA, NA, NA))
sapply(xx, FUN = function(x)format_value(x, format = "xx.x (xx.x - xx.x)", na_str = paste0("ne",1:10)))
#> [1] "ne1 (1.0 - ne2)" "ne1 (ne2 - 1.0)" "1.0 (ne1 - ne2)" "ne1 (ne2 - ne3)"

R session info
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

other attached packages:
[1] rtables_0.6.9 magrittr_2.0.3 formatters_0.5.9

@iaugusty iaugusty added the bug Something isn't working label Sep 27, 2024
@Melkiades Melkiades added the sme Tracks changes for the sme board label Sep 27, 2024
@Melkiades
Copy link
Contributor

@iaugusty could you format the code correctly? so I can run it directly (check reprex!) thanks

@shajoezhu
Copy link
Contributor

hi @iaugusty , can you please provide us with the reproducible code and an example please. Thanks!

@iaugusty
Copy link
Collaborator Author

iaugusty commented Jan 6, 2025

library(formatters)
xx <- list(c(NA, 1, NA), c(NA, NA, 1), c(1, NA, NA), c(NA, NA, NA))
sapply(xx, FUN = function(x)format_value(x, format = "xx.x (xx.x - xx.x)", na_str = paste0("ne",1:10)))
#> [1] "ne1 (1.0 - ne2)" "ne1 (ne2 - 1.0)" "1.0 (ne1 - ne2)" "ne1 (ne2 - ne3)"

@shajoezhu
Copy link
Contributor

hi @iaugusty , I see what you mean, yes, this was intentional, and you can see the usage of

format_value(c(NA, 1, NA), format = "xx.x (xx.x - xx.x)", na_str = c("NE", "<missing>"))
[1] "NE (1.0 - <missing>)"

the na_str index was intentionally specified to the i_th NA value in input c(NA, 1, NA), how the i_th NA string should be displayed

suppose a <- c(NA, 1, NA), the i_th element is not referring to a_i.

i understand your suggestion. and I think it would be a very nice feature!

@iaugusty
Copy link
Collaborator Author

iaugusty commented Jan 7, 2025

hi @shajoezhu, It would be nice if the user has some control on whether to refer to the i-th element of the na_str, or the i-th missing value in the values data input

@shajoezhu
Copy link
Contributor

awesome! thanks! agreed. I will change the label of this issue to feature

@shajoezhu shajoezhu added enhancement New feature or request and removed bug Something isn't working labels Jan 7, 2025
@Melkiades
Copy link
Contributor

I think the current behavior is the expected behavior. The best solution is to give a vector also to na_str as it is an sapply. Adding the possibility of a longer vector that matches multiple values in format_values can easily break upper-level code or have unexpected final outcomes. The user has an easy way to apply it in a custom way, so I would not pursue this additional feature for format_value

@Melkiades Melkiades added discussion Needs to consider if worth it or how and removed enhancement New feature or request labels Jan 21, 2025
@iaugusty
Copy link
Collaborator Author

iaugusty commented Jan 21, 2025

In case of multi-stats with different length, eg in a_summary : mean, sd, range, a 1-d stat vs a 2-d stat. If you want to report the 2-d stat as (NE, NE) you have to pass na_str as vector of length 2, while mean itself is a 1-d stat

This is the current behaviour

> format_value(c(NA, 1, NA), format = "xx.x (xx.x - xx.x)", na_str = c("NE", "<missing>", "NA"))
[1] "NE (1.0 - <missing>)"
> format_value(c(1, NA, NA), format = "xx.x (xx.x - xx.x)", na_str = c("NE", "<missing>", "NA"))
[1] "1.0 (NE - <missing>)"
> format_value(c(NA, NA, NA), format = "xx.x (xx.x - xx.x)", na_str = c("NE", "<missing>", "NA"))
[1] "NE (<missing> - NA)"

I would find the following result more natural to understand.

> format_value(c(NA, 1, NA), format = "xx.x (xx.x - xx.x)", na_str = c("NE", "<missing>", "NA"))
[1] "NE (1.0 - NA)"
> format_value(c(1, NA, NA), format = "xx.x (xx.x - xx.x)", na_str = c("NE", "<missing>", "NA"))
[1] "1.0 (<missing> - NA)"

If it could be a user option to have this as the behaviour, aside to the current behaviour, the user can better control the outcome.

It is not that big deal, with the current behaviour and na_str a vector with the same value, we achieve the behaviour we want.

@Melkiades
Copy link
Contributor

If it could be a user option to have this as the behaviour, aside to the current behaviour, the user can better control the outcome.

It is not that big deal, with the current behaviour and na_str a vector with the same value, we achieve the behaviour we want.

I see; now I understand well. It makes sense to have it as a mask, as you suggest, instead of a filling vector. Having both options seem impractical, though, as it is hard to differentiate the intent of the user without another flag that would be hard to motivate or explain. I would stick with the current behavior only because of the number of fixes that we should potentially make downstream (even if I think it is usually used with only one value like NE instead of NA)

@iaugusty
Copy link
Collaborator Author

I'm fine with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Needs to consider if worth it or how sme Tracks changes for the sme board
Projects
None yet
Development

No branches or pull requests

3 participants