-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-45949: [R] Fix CRAN warnings for 19.0.1 about compiled code #45951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r/src/altrep.cpp
Outdated
R_xlen_t len = Rf_xlength(x); | ||
|
||
for (R_xlen_t i = 0; i < len; i++) { | ||
SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(x, i)); | ||
out[i] = str_elt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something about this isn't quite right, because the test here (and other assertions that strings are materialized) fail:
arrow/r/tests/testthat/test-altrep.R
Line 249 in 3c63f37
expect_true(test_arrow_altrep_is_materialized(altrep)) |
But I haven't yet figured out if this is a real problem with this code change, or maybe it's an assumption in the tests that no longer holds?
@nealrichardson @paleolimbot y'all might have thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-on for myself that line 1319 is duplicative and should be removed (but I don't want to outdate ^^^ just yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DATAPTR
and STRING_ELT
work a bit different for altrep so it seems reasonable to me to assume we were implicitly materializing before and no longer are with STRING_ELT
. It seems like this is an assumption of the test that no longer holds and the expectation could be removed.
r/src/arrow_types.h
Outdated
} else if (TYPEOF(vec) == STRSXP) { | ||
cpp11::writable::strings out(Rf_xlength(vec)); | ||
R_xlen_t len = Rf_xlength(vec); | ||
|
||
for (R_xlen_t i = 0; i < len; i++) { | ||
SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(vec, i)); | ||
out[i] = str_elt; | ||
} | ||
|
||
return out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this chunk is necessary to return the array of strings here. But oddly(??) when I tried this block as simply return STRING_ELT(vec, 0);
(which would return just the first element IIUC), all tests passed. So maybe I misunderstand what's happening in the MutableBuffer there and we actually only need an object of the right type? Or we don't have test coverage that ensures that the full vector is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just doing return STRING_ELT(vec, 0)
actually makes sense to me since it looks like we just need a pointer to the same address as what we'd get with DATAPTR. Seems like that's what STRING_ELT(vec, 0)
should accomplish. It does seem like a strange way to do it but it also seems like what we're doing here is already breaking the rules CRAN wants us to play by. If it works, I'm +1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've also commented this to be explicit for the next person
I don't understand this code well enough to have much to say. I did look at the part of WRE that the CRAN check points to, and it suggests using Is that not an option for us? |
It is an option in one place and I did use it there, but not all: I would get segfaults/illegal access in the places that I didn't use it. If I'm reading the code correctly, those are places where we are actually mutating in place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I took a look through and left a couple of comments. Once the altrep test failures get figured out I'll be a +1 on this.
# because there are no nulls, DATAPTR() does not materialize | ||
# because there are nulls, DATAPTR() does materialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment change and the one below are not behavior changes with this PR, but I think the comments were simply wrong (either old or copy pasted). I've tried to correct them to be accurate descriptions of what's going on (but see "does not materialize" and then two lines later expect_true(test_arrow_altrep_is_materialized(altrep))
is at odds with each other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct here that my earlier comment was simply wrong 🙂
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
310d9c2
to
a4b457d
Compare
@github-actions crossbow submit -g r |
Revision: a4b457d Submitted crossbow builds: ursacomputing/crossbow @ actions-97a8d02cea |
Failres: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
r/src/altrep.cpp
Outdated
@@ -531,7 +552,7 @@ struct AltrepFactor : public AltrepVectorBase<AltrepFactor> { | |||
SEXP copy = PROTECT(Rf_allocVector(INTSXP, size)); | |||
|
|||
// copy the data from the array, through Get_region | |||
Get_region(alt, 0, size, reinterpret_cast<int*>(DATAPTR(copy))); | |||
Get_region(alt, 0, size, reinterpret_cast<int*>(INTEGER(copy))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get_region(alt, 0, size, reinterpret_cast<int*>(INTEGER(copy))); | |
Get_region(alt, 0, size, INTEGER(copy)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, thanks for that reminder, I found a few others too
r/src/altrep.cpp
Outdated
SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(x, i)); | ||
out[i] = str_elt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking that! I don't recall any tests that checked the whether something was materialized more than once but it was quite a long time ago 🙂
r/tests/testthat/test-altrep.R
Outdated
# DATAPTR() should always materialize for strings | ||
# DATAPTR() does not materialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@assignUser would it be possible to pull this into 20? We will need to patch it in our CRAN release regardless, but it would be nice to have it on the actual release. |
Of course! |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 34a984c. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things! CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment)) I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 * GitHub Issue: #45949 Lead-authored-by: Jonathan Keane <jkeane@gmail.com> Co-authored-by: Dewey Dunnington <dewey@dunnington.ca> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things! CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment)) I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 * GitHub Issue: #45949 Lead-authored-by: Jonathan Keane <jkeane@gmail.com> Co-authored-by: Dewey Dunnington <dewey@dunnington.ca> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
…pache#45951) This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things! CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see apache#45951 (comment)) I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 * GitHub Issue: apache#45949 Lead-authored-by: Jonathan Keane <jkeane@gmail.com> Co-authored-by: Dewey Dunnington <dewey@dunnington.ca> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
…pache#45951) This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things! CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see apache#45951 (comment)) I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 * GitHub Issue: apache#45949 Lead-authored-by: Jonathan Keane <jkeane@gmail.com> Co-authored-by: Dewey Dunnington <dewey@dunnington.ca> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
This gets rid of
OBJECT
,DATAPTR
has been replaced withINTEGER()
,REAL()
, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment))
I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏