Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.frame attributes are preserved on mutate() but dropped on group_by |> mutate #6100

Closed
MilesMcBain opened this issue Nov 29, 2021 · 7 comments
Labels
bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange()

Comments

@MilesMcBain
Copy link

Reprex:

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(purrr)
attr(mtcars, "test") <- "foo"
mtcars_grouped <- group_by(mtcars, cyl)

mtcars |>
  mutate(new_col = 1) |>
  attributes() |>
  pluck("test")
#> [1] "foo"

mtcars_grouped |>
  mutate(new_col = 1) |>
  attributes() |>
  pluck("test")
#> NULL

mtcars_grouped |>
  mutate(across(where(is.complex), as.character)) |>
  attributes() |>
  pluck("test")
#> NULL
# did nothing but still lost attrs on the base data.frame

It would be better if no attributes were lost with mutate. Feels weird in the last case where no mutate actually gets done.

However if group_by() |> mutate() must drop attributes, it's probably better that mutate does also. You can easily get tricked into thinking some code is going to work, but then it bombs when it accidentally gets passed some sticky groups. This happened to me today, 4 levels of package context up from where the mutate was.

@MilesMcBain MilesMcBain changed the title data.frame attributes are preserved on mutate by dropped on group_by |> mutate data.frame attributes are preserved on mutate but dropped on group_by |> mutate Nov 29, 2021
@romainfrancois romainfrancois changed the title data.frame attributes are preserved on mutate but dropped on group_by |> mutate data.frame attributes are preserved on mutate() but dropped on group_by |> mutate Nov 29, 2021
@ellessenne
Copy link

Hi all,
I see the same issue with a simple mutate() call, where attributes actually get dropped; see for reference: ellessenne/comorbidity#51
Would this be fixed by #6102?
Thanks!

Alessandro

@hadley hadley added bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange() labels Apr 16, 2022
@sda030

This comment was marked as off-topic.

DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Nov 7, 2022
DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Nov 16, 2022
DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Nov 17, 2022
DavisVaughan added a commit that referenced this issue Nov 17, 2022
* Implement `.by` for `mutate()` and `summarise()`

* Add tests related to #6100

* Add `.by` support in `filter()`

* Add `.by` support in `slice()` family

* Move the `.by` collision checks into the generics

* Tweak `summarise_verbose()` to respect if the global option is `TRUE`

This should override the `global_env()` reference check, so we can force verbosity in relevant documentation pages

* Add a full documentation page specific to `.by`

* Add section about verbs without `.by` support

* NEWS bullet

* Order groups by first appearance when using `.by`

* NEWS bullet updates

* We have decided that the `NULL` column case is too obscure to care about

* NEWS tweaks based on feedback

* Second pass on `.by` help page based on feedback

* Include `.by` help page in pkgdown reference

* Apply suggestions from code review

Co-authored-by: Hadley Wickham <h.wickham@gmail.com>

* Regenerate snapshots

* Regenerate documentation

* Ensure that `compute_by()` is type stable on `$data`

It should always return a bare tibble, even though `group_data()` returns a data frame for data frame input.

* Remove `TODO`s

Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
@hadley
Copy link
Member

hadley commented Dec 15, 2022

library(dplyr, warn.conflicts = FALSE)
attr(mtcars, "test") <- "foo"

mtcars |>
  mutate(new_col = 1) |>
  attr("test")
#> [1] "foo"

mtcars |>
  mutate(new_col = 1, .by = cyl) |>
  attr("test")
#> [1] "foo"

Created on 2022-12-15 with reprex v2.0.2

@hadley
Copy link
Member

hadley commented Dec 15, 2022

Since this problem is resolved by .by, I'm going to close this issue. Fixing it via group_by() is surprisingly challenging because we have to forward attributes into a new data structure, whereas .by completely circumvents the problem because there's no intermediate data structure.

@hadley hadley closed this as completed Dec 15, 2022
@MilesMcBain
Copy link
Author

Maybe not the biggest deal, but this doesn't solve the problem in the original context I mentioned, where a user passes something grouped across an interface boundary.

As an author it'd be nice to be able to write functions that 'just work' within groups or on the whole dataframe as a single group, but things like this mean you have to add special handling code to cater for attributes and groups in combination, AND you also have to know this obscure problem exists to put that code in.

I think it'll continue to catch a small number of people out.

Unless you go full deprecation on group_by in favour of .by. 😝

@MilesMcBain
Copy link
Author

Does mutate have enough context to issue a warning it's about to drop the attributes?

@hadley
Copy link
Member

hadley commented Dec 15, 2022

In general, I think it's risky to rely on random attributes being magically pass along through any operation. We've done our best to support it in most places in dplyr, but for grouped mutates with random attributes, it doesn't feel to me like the benefit is worth the implementation cost, especially given that there's now an alternative available.

I doubt we can deprecate group_by(), but if .by is successful, I think there's a good chance we'll supersede it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange()
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants