-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.frame attributes are preserved on mutate()
but dropped on group_by |> mutate
#6100
Comments
mutate()
but dropped on group_by |> mutate
Hi all, Alessandro |
This comment was marked as off-topic.
This comment was marked as off-topic.
* Implement `.by` for `mutate()` and `summarise()` * Add tests related to #6100 * Add `.by` support in `filter()` * Add `.by` support in `slice()` family * Move the `.by` collision checks into the generics * Tweak `summarise_verbose()` to respect if the global option is `TRUE` This should override the `global_env()` reference check, so we can force verbosity in relevant documentation pages * Add a full documentation page specific to `.by` * Add section about verbs without `.by` support * NEWS bullet * Order groups by first appearance when using `.by` * NEWS bullet updates * We have decided that the `NULL` column case is too obscure to care about * NEWS tweaks based on feedback * Second pass on `.by` help page based on feedback * Include `.by` help page in pkgdown reference * Apply suggestions from code review Co-authored-by: Hadley Wickham <h.wickham@gmail.com> * Regenerate snapshots * Regenerate documentation * Ensure that `compute_by()` is type stable on `$data` It should always return a bare tibble, even though `group_data()` returns a data frame for data frame input. * Remove `TODO`s Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
library(dplyr, warn.conflicts = FALSE)
attr(mtcars, "test") <- "foo"
mtcars |>
mutate(new_col = 1) |>
attr("test")
#> [1] "foo"
mtcars |>
mutate(new_col = 1, .by = cyl) |>
attr("test")
#> [1] "foo" Created on 2022-12-15 with reprex v2.0.2 |
Since this problem is resolved by |
Maybe not the biggest deal, but this doesn't solve the problem in the original context I mentioned, where a user passes something grouped across an interface boundary. As an author it'd be nice to be able to write functions that 'just work' within groups or on the whole dataframe as a single group, but things like this mean you have to add special handling code to cater for attributes and groups in combination, AND you also have to know this obscure problem exists to put that code in. I think it'll continue to catch a small number of people out. Unless you go full deprecation on |
Does |
In general, I think it's risky to rely on random attributes being magically pass along through any operation. We've done our best to support it in most places in dplyr, but for grouped mutates with random attributes, it doesn't feel to me like the benefit is worth the implementation cost, especially given that there's now an alternative available. I doubt we can deprecate |
Reprex:
It would be better if no attributes were lost with mutate. Feels weird in the last case where no mutate actually gets done.
However if
group_by() |> mutate()
must drop attributes, it's probably better thatmutate
does also. You can easily get tricked into thinking some code is going to work, but then it bombs when it accidentally gets passed some sticky groups. This happened to me today, 4 levels of package context up from where the mutate was.The text was updated successfully, but these errors were encountered: