data.frame attributes are preserved on `mutate()` but dropped on `group_by |> mutate` #6100

MilesMcBain · 2021-11-29T05:43:44Z

Reprex:

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(purrr)
attr(mtcars, "test") <- "foo"
mtcars_grouped <- group_by(mtcars, cyl)

mtcars |>
  mutate(new_col = 1) |>
  attributes() |>
  pluck("test")
#> [1] "foo"

mtcars_grouped |>
  mutate(new_col = 1) |>
  attributes() |>
  pluck("test")
#> NULL

mtcars_grouped |>
  mutate(across(where(is.complex), as.character)) |>
  attributes() |>
  pluck("test")
#> NULL
# did nothing but still lost attrs on the base data.frame

It would be better if no attributes were lost with mutate. Feels weird in the last case where no mutate actually gets done.

However if group_by() |> mutate() must drop attributes, it's probably better that mutate does also. You can easily get tricked into thinking some code is going to work, but then it bombs when it accidentally gets passed some sticky groups. This happened to me today, 4 levels of package context up from where the mutate was.

The text was updated successfully, but these errors were encountered:

ellessenne · 2022-03-02T07:06:17Z

Hi all,
I see the same issue with a simple mutate() call, where attributes actually get dropped; see for reference: ellessenne/comorbidity#51
Would this be fixed by #6102?
Thanks!

Alessandro

* Implement `.by` for `mutate()` and `summarise()` * Add tests related to #6100 * Add `.by` support in `filter()` * Add `.by` support in `slice()` family * Move the `.by` collision checks into the generics * Tweak `summarise_verbose()` to respect if the global option is `TRUE` This should override the `global_env()` reference check, so we can force verbosity in relevant documentation pages * Add a full documentation page specific to `.by` * Add section about verbs without `.by` support * NEWS bullet * Order groups by first appearance when using `.by` * NEWS bullet updates * We have decided that the `NULL` column case is too obscure to care about * NEWS tweaks based on feedback * Second pass on `.by` help page based on feedback * Include `.by` help page in pkgdown reference * Apply suggestions from code review Co-authored-by: Hadley Wickham <h.wickham@gmail.com> * Regenerate snapshots * Regenerate documentation * Ensure that `compute_by()` is type stable on `$data` It should always return a bare tibble, even though `group_data()` returns a data frame for data frame input. * Remove `TODO`s Co-authored-by: Hadley Wickham <h.wickham@gmail.com>

hadley · 2022-12-15T20:56:28Z

library(dplyr, warn.conflicts = FALSE)
attr(mtcars, "test") <- "foo"

mtcars |>
  mutate(new_col = 1) |>
  attr("test")
#> [1] "foo"

mtcars |>
  mutate(new_col = 1, .by = cyl) |>
  attr("test")
#> [1] "foo"

^{Created on 2022-12-15 with reprex v2.0.2}

hadley · 2022-12-15T20:59:54Z

Since this problem is resolved by .by, I'm going to close this issue. Fixing it via group_by() is surprisingly challenging because we have to forward attributes into a new data structure, whereas .by completely circumvents the problem because there's no intermediate data structure.

MilesMcBain · 2022-12-15T22:08:35Z

Maybe not the biggest deal, but this doesn't solve the problem in the original context I mentioned, where a user passes something grouped across an interface boundary.

As an author it'd be nice to be able to write functions that 'just work' within groups or on the whole dataframe as a single group, but things like this mean you have to add special handling code to cater for attributes and groups in combination, AND you also have to know this obscure problem exists to put that code in.

I think it'll continue to catch a small number of people out.

Unless you go full deprecation on group_by in favour of .by. 😝

MilesMcBain · 2022-12-15T22:16:28Z

Does mutate have enough context to issue a warning it's about to drop the attributes?

hadley · 2022-12-15T23:45:07Z

In general, I think it's risky to rely on random attributes being magically pass along through any operation. We've done our best to support it in most places in dplyr, but for grouped mutates with random attributes, it doesn't feel to me like the benefit is worth the implementation cost, especially given that there's now an alternative available.

I doubt we can deprecate group_by(), but if .by is successful, I think there's a good chance we'll supersede it.

MilesMcBain changed the title ~~data.frame attributes are preserved on mutate by dropped on group_by |> mutate~~ data.frame attributes are preserved on mutate but dropped on group_by |> mutate Nov 29, 2021

romainfrancois changed the title ~~data.frame attributes are preserved on mutate but dropped on group_by |> mutate~~ data.frame attributes are preserved on mutate() but dropped on group_by |> mutate Nov 29, 2021

romainfrancois mentioned this issue Nov 29, 2021

mutate() preserves attributes of grouped data frames #6102

Closed

ellessenne mentioned this issue Mar 2, 2022

Variable labels disappearing after deriving scores ellessenne/comorbidity#51

Open

hadley added bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange() labels Apr 16, 2022

This comment was marked as off-topic.

Sign in to view

DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Nov 7, 2022

Add tests related to tidyverse#6100

1912a02

DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Nov 16, 2022

Add tests related to tidyverse#6100

40b9a31

DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Nov 17, 2022

Add tests related to tidyverse#6100

b2b159e

hadley closed this as completed Dec 15, 2022

sfirke mentioned this issue Feb 6, 2023

mutate() drops some attributes of a tabyl object #6689

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data.frame attributes are preserved on `mutate()` but dropped on `group_by |> mutate` #6100

data.frame attributes are preserved on `mutate()` but dropped on `group_by |> mutate` #6100

MilesMcBain commented Nov 29, 2021

ellessenne commented Mar 2, 2022

This comment was marked as off-topic.

hadley commented Dec 15, 2022

hadley commented Dec 15, 2022 •

edited

Loading

MilesMcBain commented Dec 15, 2022

MilesMcBain commented Dec 15, 2022

hadley commented Dec 15, 2022

data.frame attributes are preserved on mutate() but dropped on group_by |> mutate #6100

data.frame attributes are preserved on mutate() but dropped on group_by |> mutate #6100

Comments

MilesMcBain commented Nov 29, 2021

ellessenne commented Mar 2, 2022

This comment was marked as off-topic.

hadley commented Dec 15, 2022

hadley commented Dec 15, 2022 • edited Loading

MilesMcBain commented Dec 15, 2022

MilesMcBain commented Dec 15, 2022

hadley commented Dec 15, 2022

data.frame attributes are preserved on `mutate()` but dropped on `group_by |> mutate` #6100

data.frame attributes are preserved on `mutate()` but dropped on `group_by |> mutate` #6100

hadley commented Dec 15, 2022 •

edited

Loading