-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutate() drops some attributes of a tabyl object #6689
Comments
I don't think this is really dplyr's fault. Internally we do library(janitor, warn.conflicts = FALSE)
df <- mtcars |> tabyl(cyl)
attributes(df)
#> $names
#> [1] "cyl" "n" "percent"
#>
#> $class
#> [1] "tabyl" "data.frame"
#>
#> $row.names
#> [1] 1 2 3
#>
#> $core
#> cyl n percent
#> 1 4 11 0.34375
#> 2 6 7 0.21875
#> 3 8 14 0.43750
#>
#> $tabyl_type
#> [1] "one_way"
df2 <- df[c("cyl", "n", "percent")]
attributes(df2)
#> $names
#> [1] "cyl" "n" "percent"
#>
#> $row.names
#> [1] 1 2 3
#>
#> $class
#> [1] "tabyl" "data.frame" It is worth noting that library(tibble)
df <- mtcars |>
as_tibble()
attr(df, "foo") <- "bar"
attributes(df)
#> $class
#> [1] "tbl_df" "tbl" "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32
#>
#> $names
#> [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
#> [11] "carb"
#>
#> $foo
#> [1] "bar"
df2 <- df[c("mpg", "cyl")]
attributes(df2)
#> $names
#> [1] "mpg" "cyl"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32
#>
#> $class
#> [1] "tbl_df" "tbl" "data.frame"
#>
#> $foo
#> [1] "bar" |
FWIW this is not a new issue in 1.1.0, this was present in 1.0.10 as well. We also used # devtools::install_version("dplyr", "1.0.10")
library(dplyr, warn.conflicts = FALSE)
library(janitor, warn.conflicts = FALSE)
packageVersion("dplyr")
#> [1] '1.0.10'
mtcars |>
tabyl(cyl) |>
mutate(foo = 1) |>
attributes()
#> $names
#> [1] "cyl" "n" "percent" "foo"
#>
#> $row.names
#> [1] 1 2 3
#>
#> $class
#> [1] "tabyl" "data.frame" Created on 2023-02-07 with reprex v2.0.2.9000 |
Thanks @DavisVaughan, I think I understand the above. I just don't follow then how this works:
There |
That's a good question. In Lines 240 to 244 in e8702df
We only apply this patch to these two specific types because we don't have any power to change them, and this is the best we can do there. We also know that any extra attributes on those types are just extra metadata that the user added which just needs to be carried along, but they won't be dependent on the data in any way. We don't want to apply this patch in general though. Imagine the tsibble class, which has an attribute that identifies the column of the data that functions as the date-time index for the data frame. If This typically works pretty well because people who subclass tibble will automatically get attribute propagation due to |
Thank you very much for explaining this, it's invaluable in figuring out how to best develop janitor. I hope my last question is a quick one. Re:
If I wrote a |
Yes, that is exactly right. Adding a You can also see |
Using dplyr 1.1.0 and janitor 2.2.0, a user reported this problem with a
mutate
call breaking their janitor pipeline. I see mutate is causing attributes to drop. But I don't understand how or why, especially as I can reproduce this example from Hadley that this shouldn't happen.This has the
$core
and$tabyl_type
attributes:This drops the attributes
$core
and$tabyl_type
:Do you know why these attributes are dropped compared to the comment I linked above? And thoughts on whether this could/might be addressed on the dplyr side? In the past I had to work around dplyr commands in my code for this reason and it's been great to see the package moving toward more preservation of data.frame attributes 🙏
Created on 2023-02-06 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: