-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An alternative to #6: Gabe Becker's proposed 2-repo solution #10
Comments
What would be a good GitHub owner name for this new downstream production-level R universe? r-prd? r-releases-prod? r-valid? |
I might be missing something, but whether a package is 'broken' or not depends on the cohort of packages the user actually has installed, doesn't it? If only 2 repos, then a package can only be 'broken' or not. There may be many valid dependency chains, with only one broken. A -> B -> C Where A is upstream. An update to A causes B's tests to fail. It is put in the broken repo along with C and D. However, in actuality C's tests all pass, only D's fail. This is as C and D use different subsets of functions from B. That means that A -> B -> C is a valid dependency chain that would be broken by this 2 repo arrangement. Then just using a 'normal' |
Yeah, the whole revdep chain would need to go down too. It’s a little extra work up front, but then we could skip scraping those revdeps altogether. Not impossible for this way of doing things. |
As you say, maybe that’s heavy-handed. However, I don’t see a generic way to find out which subset of a package is failing, just from information in logs. |
If the test suite is adequate, then a package only needs to pass its own tests right. It doesn't need to know if an upstream dependency passes all of its tests, or even further removed whether that package's 100 revdeps pass theirs. So I'm quite in favour of the checks dashboard type thing, or a function that returns this. You only need to know for the package you are installing. Then on an ongoing basis, the checker function can come in handy. It's the power of decentralisation. Let each individual community decide what it wants to use. |
sorry maybe this belongs in #6. Discussion continued at #6 (comment) |
From #6 (comment)
These points also support #10. With r-universe-org/help#369, it will only be necessary to scrape the existing check results (no need for revdep checks). |
For a downstream production-level repo, it would be ideal to leverage R-universe as much as possible. |
|
On second thought: to have the right user-side guarantees, I think we would need to remove reverse dependencies from "https://r-production.r-universe.dev" if something goes wrong with a package. If that is the case, then https://r-releases.r-universe.dev/ and "https://r-production.r-universe.dev" will have the exact same dependency graphs for every hosted package. Which means that any test failure in "https://r-production.r-universe.dev" is random and probably a false positive. So my current preference is to:
|
Yes, I think 3 is the logical conclusion, you'd be able to rely on the checks from R-releases. |
If B isnt passing its own tests, then B is broken, meaning it should only be offered in a "use at your own risk" capacity. That risk may sometimes be quite small, e.g., the notorious 1 test breaks on M1 macs case, but without an evolution of how tests are treated in R packages, similar to what @HenrikBengtsson brought up in the latest working group call, install.packages doesn't have the ability to differentiate quantify risk. Given then that there is some risk, my argument is that that risk should be opt-in rather than opt-out. Users can opt into that risk by adding the unsafe repo (or whatever we end up calling it if that is too pejorative) to their repos, either via option or via the argument to install.packages. If they did that, they would be able to get all of {A, B, C, D}). I think making risk like this opt-out would be detrimental to end users, particularly novice ones, since the tooling is insufficient to even tell them that the risk exists, much less to help them assess it. Furthermore it would be antithethical to the concept of production, as while you might need to do this but it would need to be a manual intervention by the admin in my experience, and may (reasonably) not be allowed at all in a validated context, regardless of how unbroken we might expect C's functionality to be. The other thing to keep in mind is that just because someone does isntall.packages("C"), does not mean that they won't also sometimes directly use functionality from B in their scripts, including parts of B that aren't the bits that C use. B could still be broken for some of their intended purposes, even if C itself "works fine", which would mean that the repo is still serving a package broken to its intended purpose to the user. |
Thank you @gmbecker, we are taking all of these considerations into account. For these and other reasons, i.e. prior expectations for novice users using |
We now have space to host the two repos:
I am about to start working on:
|
The two-repo strategy is well underway, and given #57, I think we can close the thread above. |
Suppose r-releases.r-universe.dev is a repo with all the releases, and there is downstream universe with just the ones that pass R CMD check and revdep checks, just as @gmbecker originally proposed in r-universe-org/help#363. It should be simple to scrape the check results from https://github.com/r-universe/r-releases/actions, select a subset of https://github.com/r-releases/r-releases.r-universe.dev/blob/main/packages.json with non-broken packages, and then create a different universe downstream.
As part of that selection process, maybe we could impose version number etiquette too. Suppose we get the version numbers and their commit hashes when we scrape https://github.com/r-universe/r-releases/actions. (@jeroen, this may rely on the nice titles you give the jobs, such as
r.releases.utils 0.0.5
andsys 3.3
.) If we detect that the commit hashes are different but the latest version is not strictly greater than the previous one, then we can omit the package from the production repo.The advantages over #6 are:
install.pacakges()
normally.To me (2) is more important than (1).
The challenges relative to #6 are:
I was hoping to have all repos part of https://github.com/r-releases, but I think the creation of a new universe would mean the creation of a new special repo, e.g. https://github.com/r-prd/r-prd.r-universe.dev. I would be open to a better name than this.
The text was updated successfully, but these errors were encountered: