Skip to content

GH-45949: [R] Fix CRAN warnings for 19.0.1 about compiled code #45951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 5, 2025

Conversation

jonkeane
Copy link
Member

@jonkeane jonkeane commented Mar 27, 2025

This gets rid of OBJECT, DATAPTR has been replaced with INTEGER(), REAL(), etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏

r/src/altrep.cpp Outdated
Comment on lines 1321 to 1325
R_xlen_t len = Rf_xlength(x);

for (R_xlen_t i = 0; i < len; i++) {
SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(x, i));
out[i] = str_elt;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this isn't quite right, because the test here (and other assertions that strings are materialized) fail:

expect_true(test_arrow_altrep_is_materialized(altrep))

But I haven't yet figured out if this is a real problem with this code change, or maybe it's an assumption in the tests that no longer holds?

@nealrichardson @paleolimbot y'all might have thoughts ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-on for myself that line 1319 is duplicative and should be removed (but I don't want to outdate ^^^ just yet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATAPTR and STRING_ELT work a bit different for altrep so it seems reasonable to me to assume we were implicitly materializing before and no longer are with STRING_ELT. It seems like this is an assumption of the test that no longer holds and the expectation could be removed.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes awaiting change review Awaiting change review labels Mar 29, 2025
Comment on lines 194 to 203
} else if (TYPEOF(vec) == STRSXP) {
cpp11::writable::strings out(Rf_xlength(vec));
R_xlen_t len = Rf_xlength(vec);

for (R_xlen_t i = 0; i < len; i++) {
SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(vec, i));
out[i] = str_elt;
}

return out;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this chunk is necessary to return the array of strings here. But oddly(??) when I tried this block as simply return STRING_ELT(vec, 0); (which would return just the first element IIUC), all tests passed. So maybe I misunderstand what's happening in the MutableBuffer there and we actually only need an object of the right type? Or we don't have test coverage that ensures that the full vector is there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just doing return STRING_ELT(vec, 0) actually makes sense to me since it looks like we just need a pointer to the same address as what we'd get with DATAPTR. Seems like that's what STRING_ELT(vec, 0) should accomplish. It does seem like a strange way to do it but it also seems like what we're doing here is already breaking the rules CRAN wants us to play by. If it works, I'm +1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've also commented this to be explicit for the next person

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 29, 2025
@nealrichardson
Copy link
Member

I don't understand this code well enough to have much to say. I did look at the part of WRE that the CRAN check points to, and it suggests using DATAPTR_RO instead of DATAPTR: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Moving-into-C-API-compliance

Is that not an option for us?

@jonkeane
Copy link
Member Author

suggests using DATAPTR_RO instead of DATAPTR: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Moving-into-C-API-compliance

Is that not an option for us?

It is an option in one place and I did use it there, but not all: I would get segfaults/illegal access in the places that I didn't use it. If I'm reading the code correctly, those are places where we are actually mutating in place

Copy link
Member

@amoeba amoeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I took a look through and left a couple of comments. Once the altrep test failures get figured out I'll be a +1 on this.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 31, 2025
# because there are no nulls, DATAPTR() does not materialize
# because there are nulls, DATAPTR() does materialize
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment change and the one below are not behavior changes with this PR, but I think the comments were simply wrong (either old or copy pasted). I've tried to correct them to be accurate descriptions of what's going on (but see "does not materialize" and then two lines later expect_true(test_arrow_altrep_is_materialized(altrep)) is at odds with each other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct here that my earlier comment was simply wrong 🙂

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 31, 2025
@jonkeane jonkeane requested a review from amoeba March 31, 2025 13:42
jonkeane and others added 4 commits April 5, 2025 09:48
@jonkeane jonkeane force-pushed the 45949_nonapi_again branch from 310d9c2 to a4b457d Compare April 5, 2025 14:49
@jonkeane
Copy link
Member Author

jonkeane commented Apr 5, 2025

@github-actions crossbow submit -g r

@jonkeane jonkeane requested a review from paleolimbot April 5, 2025 14:49
Copy link

github-actions bot commented Apr 5, 2025

Revision: a4b457d

Submitted crossbow builds: ursacomputing/crossbow @ actions-97a8d02cea

Task Status
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-sanitizer GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizer GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions

@jonkeane
Copy link
Member Author

jonkeane commented Apr 5, 2025

Failres: test-r-rstudio-r-base-4.1-opensuse155 and test-r-offline-maximal are unrelated / being fixed elsewhere.

Copy link
Member

@amoeba amoeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

r/src/altrep.cpp Outdated
@@ -531,7 +552,7 @@ struct AltrepFactor : public AltrepVectorBase<AltrepFactor> {
SEXP copy = PROTECT(Rf_allocVector(INTSXP, size));

// copy the data from the array, through Get_region
Get_region(alt, 0, size, reinterpret_cast<int*>(DATAPTR(copy)));
Get_region(alt, 0, size, reinterpret_cast<int*>(INTEGER(copy)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Get_region(alt, 0, size, reinterpret_cast<int*>(INTEGER(copy)));
Get_region(alt, 0, size, INTEGER(copy));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, thanks for that reminder, I found a few others too

r/src/altrep.cpp Outdated
Comment on lines 1311 to 1312
SEXP str_elt = reinterpret_cast<SEXP>(STRING_ELT(x, i));
out[i] = str_elt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking that! I don't recall any tests that checked the whether something was materialized more than once but it was quite a long time ago 🙂

# DATAPTR() should always materialize for strings
# DATAPTR() does not materialize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-actions github-actions bot added awaiting merge Awaiting merge awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting merge Awaiting merge labels Apr 5, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 5, 2025
@jonkeane jonkeane merged commit 34a984c into apache:main Apr 5, 2025
10 checks passed
@jonkeane jonkeane removed the awaiting change review Awaiting change review label Apr 5, 2025
@jonkeane
Copy link
Member Author

jonkeane commented Apr 5, 2025

@assignUser would it be possible to pull this into 20? We will need to patch it in our CRAN release regardless, but it would be nice to have it on the actual release.

@assignUser
Copy link
Member

Of course!

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 34a984c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

amoeba pushed a commit that referenced this pull request Apr 6, 2025
This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: #45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
assignUser pushed a commit that referenced this pull request Apr 7, 2025
This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see #45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: #45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Apr 15, 2025
…pache#45951)

This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see apache#45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: apache#45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Apr 15, 2025
…pache#45951)

This gets rid of `OBJECT`, `DATAPTR` has been replaced with `INTEGER()`, `REAL()`, etc. though strings are more complicated. I will fully admit that this C++ is stretching my comfort zone, so might include obviously wrong things!

CI is currently failing, but I'm not totally sure yet if that means the code changes here are wrong or if maybe these allow us to have slightly different assumptions about materialization (see apache#45951 (comment))

I've also requested reviews broadly for folks I know have been around this code before, I appreciate any effort that y'all can spare 🙏 

* GitHub Issue: apache#45949

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants