An alternative to #6: Gabe Becker's proposed 2-repo solution #10

wlandau · 2024-03-01T18:23:12Z

Suppose r-releases.r-universe.dev is a repo with all the releases, and there is downstream universe with just the ones that pass R CMD check and revdep checks, just as @gmbecker originally proposed in r-universe-org/help#363. It should be simple to scrape the check results from https://github.com/r-universe/r-releases/actions, select a subset of https://github.com/r-releases/r-releases.r-universe.dev/blob/main/packages.json with non-broken packages, and then create a different universe downstream.

As part of that selection process, maybe we could impose version number etiquette too. Suppose we get the version numbers and their commit hashes when we scrape https://github.com/r-universe/r-releases/actions. (@jeroen, this may rely on the nice titles you give the jobs, such as r.releases.utils 0.0.5 and sys 3.3.) If we detect that the commit hashes are different but the latest version is not strictly greater than the previous one, then we can omit the package from the production repo.

The advantages over #6 are:

The ability to use install.pacakges() normally.
Faster installation and no risk of hitting rate limits because there would be no GitHub API calls at installation time.

To me (2) is more important than (1).

The challenges relative to #6 are:

Learning how to scrape https://github.com/r-universe/r-releases/actions.
Figuring out where to put that downstream universe.

I was hoping to have all repos part of https://github.com/r-releases, but I think the creation of a new universe would mean the creation of a new special repo, e.g. https://github.com/r-prd/r-prd.r-universe.dev. I would be open to a better name than this.

The text was updated successfully, but these errors were encountered:

wlandau · 2024-03-01T18:38:19Z

What would be a good GitHub owner name for this new downstream production-level R universe? r-prd? r-releases-prod? r-valid?

shikokuchuo · 2024-03-01T19:28:38Z

I might be missing something, but whether a package is 'broken' or not depends on the cohort of packages the user actually has installed, doesn't it? If only 2 repos, then a package can only be 'broken' or not. There may be many valid dependency chains, with only one broken.

A -> B -> C
....... B -> D

Where A is upstream. An update to A causes B's tests to fail. It is put in the broken repo along with C and D.

However, in actuality C's tests all pass, only D's fail. This is as C and D use different subsets of functions from B. That means that A -> B -> C is a valid dependency chain that would be broken by this 2 repo arrangement.

Then just using a 'normal' install.packages() won't find any of B, C or D any more.

wlandau · 2024-03-01T21:25:32Z

Yeah, the whole revdep chain would need to go down too. It’s a little extra work up front, but then we could skip scraping those revdeps altogether. Not impossible for this way of doing things.

wlandau · 2024-03-01T21:27:30Z

As you say, maybe that’s heavy-handed. However, I don’t see a generic way to find out which subset of a package is failing, just from information in logs.

shikokuchuo · 2024-03-01T21:45:22Z

If the test suite is adequate, then a package only needs to pass its own tests right. It doesn't need to know if an upstream dependency passes all of its tests, or even further removed whether that package's 100 revdeps pass theirs.

So I'm quite in favour of the checks dashboard type thing, or a function that returns this. You only need to know for the package you are installing. Then on an ongoing basis, the checker function can come in handy.

It's the power of decentralisation. Let each individual community decide what it wants to use.

shikokuchuo · 2024-03-01T21:47:33Z

sorry maybe this belongs in #6. Discussion continued at #6 (comment)

wlandau · 2024-03-02T20:31:02Z

From #6 (comment)

After Efficiently get the check results of a small list of packages r-universe-org/help#370, implementation can begin.

After Rerun a package's checks whenever a strong dependency updates r-universe-org/help#369, user-side package correctness/compatibility guarantees will exceed those of CRAN.

These points also support #10. With r-universe-org/help#369, it will only be necessary to scrape the existing check results (no need for revdep checks).

wlandau · 2024-03-08T19:09:31Z

For a downstream production-level repo, it would be ideal to leverage R-universe as much as possible. ~~My only concern is that we may get a duplicated (and possibly conflicting) set of health checks.~~

wlandau · 2024-03-08T19:11:58Z

Actually, it could be important to pass health checks in both production and QA. So we would want to pull from both https://r-releases.r-universe.dev and "https://r-production.r-universe.dev" to decide whether to keep a package on "https://r-production.r-universe.dev".

wlandau · 2024-03-08T19:27:40Z

On second thought: to have the right user-side guarantees, I think we would need to remove reverse dependencies from "https://r-production.r-universe.dev" if something goes wrong with a package. If that is the case, then https://r-releases.r-universe.dev/ and "https://r-production.r-universe.dev" will have the exact same dependency graphs for every hosted package. Which means that any test failure in "https://r-production.r-universe.dev" is random and probably a false positive.

So my current preference is to:

If a package checks fail in https://r-releases.r-universe.dev, remove both the package and all its strong reverse dependencies from "https://r-production.r-universe.dev".
Ignore checks from "https://r-production.r-universe.dev" when deciding (1).
In fact, consider suppressing R CMD check in "https://r-production.r-universe.dev" to avoid confusion and duplication.

shikokuchuo · 2024-03-08T19:45:35Z

Yes, I think 3 is the logical conclusion, you'd be able to rely on the checks from R-releases.

wlandau · 2024-03-11T15:19:44Z

To recap recent discussions: we decided to put #6 on hold as we pursue #10. If the dual-repo option works well, then we will close #6 as "not planned".

gmbecker · 2024-03-11T19:11:53Z

I might be missing something, but whether a package is 'broken' or not depends on the cohort of packages the user actually has installed, doesn't it? If only 2 repos, then a package can only be 'broken' or not. There may be many valid dependency chains, with only one broken.

A -> B -> C ....... B -> D

Where A is upstream. An update to A causes B's tests to fail. It is put in the broken repo along with C and D.

However, in actuality C's tests all pass, only D's fail. This is as C and D use different subsets of functions from B. That means that A -> B -> C is a valid dependency chain that would be broken by this 2 repo arrangement.

Then just using a 'normal' install.packages() won't find any of B, C or D any more.

If B isnt passing its own tests, then B is broken, meaning it should only be offered in a "use at your own risk" capacity. That risk may sometimes be quite small, e.g., the notorious 1 test breaks on M1 macs case, but without an evolution of how tests are treated in R packages, similar to what @HenrikBengtsson brought up in the latest working group call, install.packages doesn't have the ability to differentiate quantify risk.

Given then that there is some risk, my argument is that that risk should be opt-in rather than opt-out. Users can opt into that risk by adding the unsafe repo (or whatever we end up calling it if that is too pejorative) to their repos, either via option or via the argument to install.packages. If they did that, they would be able to get all of {A, B, C, D}).

I think making risk like this opt-out would be detrimental to end users, particularly novice ones, since the tooling is insufficient to even tell them that the risk exists, much less to help them assess it. Furthermore it would be antithethical to the concept of production, as while you might need to do this but it would need to be a manual intervention by the admin in my experience, and may (reasonably) not be allowed at all in a validated context, regardless of how unbroken we might expect C's functionality to be.

The other thing to keep in mind is that just because someone does isntall.packages("C"), does not mean that they won't also sometimes directly use functionality from B in their scripts, including parts of B that aren't the bits that C use. B could still be broken for some of their intended purposes, even if C itself "works fine", which would mean that the repo is still serving a package broken to its intended purpose to the user.

shikokuchuo · 2024-03-11T19:30:00Z

Thank you @gmbecker, we are taking all of these considerations into account. For these and other reasons, i.e. prior expectations for novice users using install.packages(), we are actually looking at your 2-repo proposal as a priority. The 'production' repo could then be the default as you describe above, with the choice of opting out to the wider 'community' or 'QA' repo or whatever you want to call it.

wlandau · 2024-05-21T13:04:22Z

We now have space to host the two repos:

repo	QA	production
`install.packages(repos = "...")`	https://multiverse.r-multiverse.org	https://production.-multiverse.org
packages.json	https://github.com/r-multiverse/multiverse	https://github.com/r-multiverse/production
R-universe	https://github.com/r-universe/r-multiverse	https://github.com/r-universe/r-production

I am about to start working on:

Migrating existing infrastructure to the new location for the QA universe.
Building the production packages.json based on the results of automated checks.

wlandau · 2024-06-21T17:54:27Z

The two-repo strategy is well underway, and given #57, I think we can close the thread above.

wlandau changed the title ~~An alternative approach: Gabe Becker's proposed 2-repo solution~~ An alternative to #6: Gabe Becker's proposed 2-repo solution Mar 1, 2024

This was referenced Mar 1, 2024

r-universe for production r-universe-org/help#363

Closed

New package idea: install "safe" packages from https://r-releases.r-universe.dev #6

Closed

wlandau mentioned this issue Mar 2, 2024

Efficiently get the check results of a small list of packages r-universe-org/help#370

Open

wlandau mentioned this issue Mar 4, 2024

Package version best practices #21

Closed

This was referenced Mar 15, 2024

Move the body of releases::check_versions() to r.releases.internals #43

Closed

Multiple universes with the same user r-universe-org/help#88

Open

wlandau mentioned this issue Apr 18, 2024

Dependent dev version remote packages #48

Closed

wlandau mentioned this issue May 17, 2024

URL hosting #50

Closed

wlandau closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An alternative to #6: Gabe Becker's proposed 2-repo solution #10

An alternative to #6: Gabe Becker's proposed 2-repo solution #10

wlandau commented Mar 1, 2024

wlandau commented Mar 1, 2024

shikokuchuo commented Mar 1, 2024 •

edited

Loading

wlandau commented Mar 1, 2024

wlandau commented Mar 1, 2024

shikokuchuo commented Mar 1, 2024

shikokuchuo commented Mar 1, 2024 •

edited

Loading

wlandau commented Mar 2, 2024

wlandau commented Mar 8, 2024 •

edited

Loading

wlandau commented Mar 8, 2024 •

edited

Loading

wlandau commented Mar 8, 2024 •

edited

Loading

shikokuchuo commented Mar 8, 2024

wlandau commented Mar 11, 2024

gmbecker commented Mar 11, 2024

shikokuchuo commented Mar 11, 2024

wlandau commented May 21, 2024 •

edited

Loading

wlandau commented Jun 21, 2024

An alternative to #6: Gabe Becker's proposed 2-repo solution #10

An alternative to #6: Gabe Becker's proposed 2-repo solution #10

Comments

wlandau commented Mar 1, 2024

wlandau commented Mar 1, 2024

shikokuchuo commented Mar 1, 2024 • edited Loading

wlandau commented Mar 1, 2024

wlandau commented Mar 1, 2024

shikokuchuo commented Mar 1, 2024

shikokuchuo commented Mar 1, 2024 • edited Loading

wlandau commented Mar 2, 2024

wlandau commented Mar 8, 2024 • edited Loading

wlandau commented Mar 8, 2024 • edited Loading

wlandau commented Mar 8, 2024 • edited Loading

shikokuchuo commented Mar 8, 2024

wlandau commented Mar 11, 2024

gmbecker commented Mar 11, 2024

shikokuchuo commented Mar 11, 2024

wlandau commented May 21, 2024 • edited Loading

wlandau commented Jun 21, 2024

shikokuchuo commented Mar 1, 2024 •

edited

Loading

shikokuchuo commented Mar 1, 2024 •

edited

Loading

wlandau commented Mar 8, 2024 •

edited

Loading

wlandau commented Mar 8, 2024 •

edited

Loading

wlandau commented Mar 8, 2024 •

edited

Loading

wlandau commented May 21, 2024 •

edited

Loading