Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify format_col.default, allowing e.g. vctrs_list_of columns to print well #6637

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

MichaelChirico
Copy link
Member

Closes #5948

Turns out, this code in support of #2273 (#5224) is not needed -- I don't notice any difference before/after this change and we pass the related test:

if (loaded[["sf"]]) { #2273
DT = as.data.table(st_read(system.file("shape/nc.shp", package = "sf"), quiet=TRUE))
test(15, DT[1:3, .(NAME, FIPS, geometry)], output="Ashe.*-81.4.*Surry.*-80.4")
dsf = sf::st_as_sf(data.table(x=1:10, y=1:10, s=sample(1:2, 10, TRUE)), coords=1:2)
test(16, split(dsf, dsf$s), list(`1` = dsf[dsf$s == 1, ], `2` = dsf[dsf$s == 2, ]))
}

It's possible our testing is just not extensive enough.

Copy link

github-actions bot commented Dec 6, 2024

Comparison Plot

Generated via commit 9e1260f

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 4 minutes and 33 seconds
Installing different package versions 8 minutes and 6 seconds
Running and plotting the test cases 2 minutes and 20 seconds

@MichaelChirico
Copy link
Member Author

This test fails:

registerS3method("format", "foo2130", function(x, ...) rep("All hail foo",length(x)))
test(2130.15, print(DT), output="All hail foo")  # e.g. sf:::format.sfc rather than sf:::format.sfg on each item

My sense is that if we only fail a toy example, we should just break it. In general I am thinking the better solution here is to add format_col and/or format_list_item methods as needed.

I think given the potential for breaking change, it's best to save this PR for 1.18.0.

@MichaelChirico MichaelChirico added this to the 1.18.0 milestone Dec 6, 2024
@MichaelChirico MichaelChirico requested a review from aitap March 4, 2025 20:21
aitap added 2 commits March 5, 2025 13:00
'sf' is one of the few packages that uses a list column in a data.frame
with a correctly working format() method. Since we disregard such
methods now, test that the correct behaviour still happens thanks to the
format() methods of the individual list items.
Copy link
Contributor

@aitap aitap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's compare with other data.frames.

  • tibble refuses classed lists that don't come from their own packages:
> tibble(a = structure(1, class = 'foo2130'))
# A tibble: 1 × 1
  a
  <foo2130>
1 All hail foo
> tibble(a = structure(list(1), class = 'foo2130'))
Error in `tibble()`:
! All columns in a tibble must be vectors.
✖ Column `a` is a `foo2130` object.
Run `rlang::last_trace()` to see where the error occurred.
> tibble(a = list(structure(1, class = 'foo2130')))
# A tibble: 1 × 1
  a
  <list>
1 <foo2130 [1]>
> tibble(a = 1, b = list(mtcars))
# A tibble: 1 × 2
      a b
  <dbl> <list>
1     1 <df [32 × 11]>
> tibble(a = 1, b = list_of(mtcars))
# A tibble: 1 × 2
      a               b
  <dbl> <list<df[,11]>>
1     1       [32 × 11]
  • data.frame is consistent in saying that format(<data.frame>) is whatever format(...) returns for the individual columns, but doesn't support list columns that well. By default list elements are converted to data.frame columns. Constructing one with a list verbatim requires I(), which overrides format. A list inserted manually into an existing data.frame will be formatted using the usual format method:
> data.frame(a = 1, b = list_of(mtcars)) # NB: uses as.data.frame.vctrs_vctr to make the list into a column
# omitted: similar to current data.table behaviour, i.e., formats the whole `mtcars`
> x <- data.frame(a = 1)
> x$b <- list(mtcars)
> x
# same
> data.frame(a = 1, b = I(list(mtcars))) # stores list as is but with format() overriden
  a            b
1 1 c(21, 21....
> data.frame(a = structure(list(1), class = 'foo2130')) # tries to convert a list to data.frame
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
  cannot coerce class ‘"foo2130"’ to a data.frame
> traceback()
4: stop(gettextf("cannot coerce class %s to a data.frame", sQuote(deparse(class(x))[1L])),
       domain = NA)
3: as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
2: as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
1: data.frame(a = structure(list(1), class = "foo2130"))
> x <- data.frame(a = 1)
> x$b <- structure(list(1), class = 'foo2130') # uses format()
> x
  a            b
1 1 All hail foo
> data.frame(a = I(structure(list(1), class = 'foo2130'))) # uses format.AsIs()
  a
1 1
  • DataFrame allows classed lists but expands unclassed lists or data.frames into columns. Seems to ignore format methods altogether:
> DataFrame(a = structure(list(1,10), class = 'foo2130'))
DataFrame with 2 rows and 1 column
          a
  <foo2130>
1         1
2        10
> DataFrame(a = list(1,2))
DataFrame with 1 row and 2 columns
       a.X1      a.X2
  <numeric> <numeric>
1         1         2
> DataFrame(a = list(mtcars))
DataFrame with 32 rows and 11 columns
                      a.mpg     a.cyl    a.disp      a.hp    a.drat      a.wt    a.qsec      a.vs      a.am    a.gear
                  <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Mazda RX4              21.0         6       160       110      3.90     2.620     16.46         0         1         4
Mazda RX4 Wag          21.0         6       160       110      3.90     2.875     17.02         0         1         4
# omitted
> DataFrame(a = structure(1, class = 'foo2130'))
DataFrame with 1 row and 1 column
          a
  <foo2130>
1         1
  • data.table formats lists specially by default, but defers to a format method if it exists:
> data.table(a = 1, b = list(mtcars))
       a                   b
   <num>              <list>
1:     1 <data.frame[32x11]>
> data.table(a = 1, b = list_of(mtcars))
# omitted: formats whole `mtcars` into a single string

Since we allow list columns of any class without applying as.data.(frame|table) first, we might as well format all lists, even classed ones, in a special compact form, despite a few methods, such as sf:::format.sfc, exist to do the right thing. And the sf classes still do print specially because data.table:::format_list_item.default still checks for a format method and eventually calls sf:::format.sfg.

We could also register format_col.sfc and set Enhances: sf. No need for conditional registration here since we own the generic.

Copy link

codecov bot commented Mar 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.59%. Comparing base (5c964b3) to head (9e1260f).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6637      +/-   ##
==========================================
- Coverage   98.59%   98.59%   -0.01%     
==========================================
  Files          79       79              
  Lines       14661    14659       -2     
==========================================
- Hits        14455    14453       -2     
  Misses        206      206              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

list sub-class with format() method prints full contents
2 participants