-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A snapshot-based model for production #78
Comments
It gets a bit tricky if we consider reverse depends. If a strong dependency of a passing package is failing checks, I assume we still allow the passing package to enter Production? The justification would be that we allow CRAN dependencies (i.e. we are not replicating the entire dependency tree in Multiverse), and those would not be checked anyway. The failing package would also still be in Community. |
I was actually thinking the opposite: require all dependencies to pass. (NB Some sort of compromise could be helpful, but it's not obvious what would work. If we allowed packages whose dependencies fail, then we would have to get those packages from community, staging, or the previous snapshot, which could cause new bugs or incompatibilities. Maybe instead we could approach this problem by making it as easy as possible to pass checks in staging. After all, the ideal case is to have as few failures as possible. This could mean retrying failed checks during the freeze, making it easy to reproduce checks in development, etc.
CRAN is a different model, but I think it's arguably still production.
This would also be part of the compromise if we go with the strict route and remove packages whose dependencies fail. |
I'm seeing 2 potential directions: 1. Complete distributionAll dependencies are pulled. This can be automated from the CRAN / Bioc mirror. A contribution PR would override any existing packages pulling from the mirror. The snapshot image would then be self contained and no external repositories would be set when installing. This is probably what it takes to make it truly useful as a ‘validated’ snapshot. 2. Collection of packagesPackages that pass checks. Installing packages would depend on CRAN / BioC [and Community]. The following is on revdeps is optional, we could take a strict view instead, but it seems most consistent with the 'listing of packages' approach: [ On revdeps: checks will inevitably differ between Multiverse and CRAN, sometimes just because of the grace periods available on CRAN, or because certain packages are ‘too big to fail’ on CRAN. A package can fail checks in staging and remain on CRAN. There is then this fundamental inconsistency if we remove such packages, just because they exist on Multiverse. Instead it is possible not to remove revdeps as long as they pass their own checks. Installing these packages will then require installing the failing dependency from Community. The fact that such a dependency was pulled from Community can be tracked via the installed package metadata. We could include something in our On the other hand, if taking a strict approach, the key risk here is availability - packages moving in and out of Production too frequently can provide a poor experience for both users and maintainers. We'd have to spend time thinking on policies, and this seems likely to involve more manual intervention to create each release. We should spend some time thinking on these (or variants). (1) seems to be the natural conclusion of where to take things but I wonder if @jeroen was also thinking down these lines. This would automatically lead to the repo getting bigger again and R-universe would need to be supportive. |
We can assume in most cases that users will just get non-R-multiverse dependencies from CRAN. I think this is fine because it agrees with how testing already works in Staging. In other words, even though they really have a hybrid of CRAN and R-multiverse packages, users should get the same experience we tested for. I don't think snapshots need to be complete, and I don't think they should try to be self-contained. It is normal to have an environment with multiple package repositories, even in a validated/qualified environment in a highly regulated industry.
https://www.pharmar.org is worth a look here, particularly https://www.pharmar.org/regulations/ and https://www.pharmar.org/white-paper/. FDA's Glossary of Computer System Software Development Terminology apparently says:
And the ICH E9 guidance on Statistical Principles for Clinical Trials says this about "Integrity of Data and Computer Software Validity":
For us, this sounds much more like being able to trust the packages we do include, rather than trying to be include everything.
As you have noted before, if a revdep succeeds and a dependency fails, we should assume that the revdep is avoiding all the broken features of that dependency. In an ideal world, we should let the revdep use the dependency, but block the user from doing so. Maybe your proposed alternative to |
To respond to this first point, we definitely can make use of multiple package repositories and we shouldn't need to physically bundle everything in a Multiverse distribution. But in that case, we need to make it very transparent how to retrieve the actual versions of packages used from the CRAN / Bioc mirror for the checks for any particular Multiverse package. Not needed for Multiverse packages, as they will simply be part of the release. This would then seem to satisfy the 'reproducibility' criteria of validation. |
I am rethinking what I said in #78 (comment). I still don't think it makes sense to include all dependencies in the snapshot, e.g. ones from CRAN. As we said early on, it's not realistic to capture the full dependency tree of every package all the way down to core packages like |
On the other hand, maybe omitting failing dependencies from Staging is different from conditioning on CRAN: sometimes Staging packages won't be on CRAN. Need more time to consider. |
I need more time to come to a conclusion about revdeps. But in the Validation hub paper:
I think this is a similar concept to what I was talking about. The assumption is that those dependencies pulled from the Community universe are only imports and not directly loaded by a user. But I want to be 100% clear that the |
On the revdeps, I think it's really a matter of interpretation and both are valid from some perspectives.
If we go with the latter option, will the current (or envisaged) issues file be sufficient for us to say just exclude all of these from the release? If I understand correctly, this is created when any dependency is failing. If we can avoid shenanigans such as actually needing to remove packages from staging to re-run checks etc. that would make things much easier. |
Yes, this consistency will increase how much users can trust production snapshots, and I think it will be extremely valuable. If the recommendation is to prefer R-multiverse when a package is also on CRAN, it will be much simpler for users to navigate to something that "just works" (TM). Like Gabe has said, most users need this, and they will not have the time or expertise to understand compatibility among packages.
Yes, if a package fails in either universe, then currently this generates an issue in all of its downstream revdeps. An example is https://github.com/r-multiverse/community/blob/main/issues/tidypolars, which has
One of the best parts about all this is that we don't need to remove packages from staging. In fact, the only checks we currently impose from community to staging are ad hoc checks on the |
Things will "just work" TM whichever way we approach it - either packages are not in R-multiverse production release in the first place, or they are and dependencies will be drawn from R-multiverse Production first and then CRAN. There is no ambiguity about how this works / could work - or let me know if I've missed something here. This then raises the question - R-Multiverse production is now a point release, with specific versions guaranteed to work together (validated). This also applies to CRAN dependencies. If however, we operate a CRAN-like repo for the purposes of So I think we shouldn't dismiss any of the points raised too quickly. Am open for discussion on this. |
When we include a package in production, we are claiming that tests pass using the dependency chain in Staging + CRAN. If we include a revdep but omit a dependency, users will need to go find that dependency somewhere. If they find a different version than we tested with in Staging, whether from Community or CRAN, this could break the revdep and the guarantee is lost.
Exactly what I am worried about if we include revdeps of failing dependencies in production. |
Right, so there is only ambiguity if we include Community. If we don't do that, as it is not a production repository by definition, then we have Mutiverse-production, CRAN, Bioc in that order, there is only one way for things to resolve.
And not just for revdeps, but for all packages on Multiverse-production. Stepping back, as revdeps is just a detail here. If we go back to the idea of Multiverse-production being a "listing of packages", then:
We need ideas for (2) above. Because, taking the minimalist route where we only store R-multiverse packages, dependency resolution will use CRAN. Take the case where a CRAN dependency is archived the next day after a Multiverse release. That means packages from Multiverse-production will just fail to install. |
You could add cranhaven.org repo. The R repos would end up being production.r-multiverse.org, CRAN, Bioconductor, cranhaven; package installation will depend on CRANhaven protects against "sudden" missing dependency problem and gives some cushion until packages are back to CRAN (~50% of them go back, 50% of those in the first 30 days). This shouldn't happen with Bioconductor which only removes packages every 6 months (Around March and November) but it might happen too. |
In discussing the possibility of keeping revdeps of failed dependencies, my underlying concern is how the testing environment in Staging may differ from the user's package environment locally. So I guess if we are only bundling R-multiverse Staging packages in the snapshot (which I still think is the pragmatic approach here) then we run that risk for many dependencies regardless of what we do with revdeps. I wonder if transparency is an achievable middle ground. If we can't snapshot all dependencies, can we at least list them? Maybe accompanying each snapshot could be a metadata list of packages that failed in Staging, along with their versions and URLs. We could even consider a completely separate "Failures" snapshot (need to think of a friendlier name) to make these higher-risk packages available with the exact versions from Staging at the time of the Production snapshot. We could also include metadata lists of all the packages and versions that were on CRAN and Bioconductor at the time of the Production snapshot. I think this would allow us to include revdeps of failing dependencies in Production. Production would not be perfect, but it would be fully up front and clear about the known risks. Users in highly-regulated high-stakes environments would then have the power to to be extremely careful about where they get their packages and which packages are approved for users to call directly. |
Yes, so I think we're all agreed that bundling all dependencies it is not the preferred option if it can be avoided. In terms of capturing the metadata of all package dependencies, I think this would be worthwhile. But in terms of providing a production repository, a viable option would seem to be using Posit Public Package Manager (p3m) snapshots rather than CRAN mirrors. The tests from staging would use the latest daily snapshot, and this would cut-off at our release date. Then a user can perfectly reproduce by using Multiverse snapshot + p3m snapshot. See https://posit.co/blog/migrating-from-mran-to-posit-package-manager/ |
Oh nice! Yeah, I guess P3M already gives us snapshots so we don't have to create any ourselves other than Production (and possibly a separate one with just packages that failed in staging, if you agree this would be useful). |
Or, in the "Failures" snapshot, maybe we only include packages with revdeps in production? |
@llrs thanks. As I understand it, cranhaven just copies from the R-universe CRAN mirror. As cranhaven only keeps packages around for [5] weeks, versus forever in R-universe, I don't see any advantage pointing to cranhaven. In any case, we've moved on to p3m, so it's off-topic for this thread now, but feel free to open a discussion if you want to talk about cranhaven - I'm always open to ideas. |
I'm super glad to have found this. I have some vague recollection of coming across this a long time ago... For Production, I think we should be consistent and only include metadata for packages that actually make the cut. We could produce similar metadata for Community, but I think everything should be made consistent with one of the two. Otherwise it's confusing even for us... |
For revdeps I see the possibilities as either of:
For both, checks for revdeps should be triggered when any package updates in Staging. The second option may be a bit trickier to orchestrate, but has the advantage of not potentially having swathes of packages move in and out of Production for consecutive releases. Both work in the same way for users (offer the same 'user guarantees'). |
I guess too much metadata could be confusing for novice users, even if it improves transparency for advanced users.
Even worse, if a package were to vanish from Staging as soon as an error happens, it would be extremely difficult for maintainers to fix issues because the evidence keeps disappearing. I would much prefer (1): simple to implement, simple to orchestrate, simple for everyone to understand, and with the strongest guarantees on the quality of the package cohort in Production. |
Just so I understand: for p3m, would we have |
Ok let's go with that. We can see how well in practice it works in any case - I think that this level of operational decision won't prevent us from making changes down the road if it becomes necessary. |
Awesome! As I have said before, I think the best way to prevent entire cohorts of packages from getting taken down is to prevent package failures from happening in the first place. The next phase of R-multiverse could focus on this. |
Maybe that part isn't necessary since R-universe already pulls from CRAN. |
Only difference I can see is that p3m has a slightly delayed version of CRAN - it's a daily cron job I think. Also possibly on rare occasions it may fail to catch up as quickly. So to be 100% sure that what we test is consistent with the snapshot we recommend to users, it would be safer to use Posit. Or we can approximate by using a p3m snapshot of the day after the Multiverse cut-off. |
So now we've decided how to treat revdeps, I think that could allow us to simplify things even further. See if the below works:
As our release is snapshot/archived, this protects against GitHub repos being removed. This is no different to CRAN which also allows archival at maintainer's request. This has the advantage of not needing Netlify or another hosting provider for the production CRANlike repo. I've looked into this and I think it requires them to recognise us as a legitimate open source initiative to avoid the paid offerings (have to fill in a form). I have a concern about using services like this as a change in management / policy on their end can lead to the service becoming unavailable at short notice. This way we use 'staging' as a test repo to avoid potential disruption of 'production'. Otherwise I think 'staging' is only really needed for the first model i.e. physically moving packages to/from Community to Staging in order to test dependencies. EDIT: of course we don’t have to use ‘staging’, we can also deploy straight to production. |
My main concern is the lack of a "freeze" period leading up to the snapshot. For a snapshot-based model, the freeze on non-failing packages is important to make sure we have a soft landing every quarter, and it is especially important now that we have decided to remove revdeps of failing packages. I don't think we would ever want to freeze Community.
Maybe this is where we could ask Posit for help? Or as a last resort, maybe a third universe? I know Jeroen was originally resistant to this, but maybe it wouldn't be such an extra burden in a snapshot model that only updates every quarter. |
I see, so if I have this right the plan would be [1] month prior to the release, freeze Staging - only allow packages with issues to update from Community? That would avoid a random update of some package to knock out a swathe of revdeps just prior to release. I think that makes sense. |
If we're freezing Staging prior to release, then that necessarily means we're already using the fixed SHAs for that universe. So we could just copy the packages without issues across to Production on the release date. That would be once a quarter to build the binaries and no need to re-run checks on it or anything like that. To look at it another way - we only need to create Staging at the point of freeze, and after release we have Production. So Staging can be a time-bound universe if that helps conserve resources. |
Exactly. During that month, packages are still free to update in Community, but those in Staging can only update if they have an issue file. As we discussed with Jeroen in our meeting, this prevents surprises prior to release while allowing existing problems to get fixed. |
I agree.
Yeah, that could really help. To begin the freeze (which I guess we could call the "staging phase" if we operate this way), we could do a massive update on Staging (bring in all package versions currently in Community). Then spend a month only allowing updates to Staging for failing packages. Then after the Production snapshot is created, either leave Staging alone or take down the Staging universe entirely (whatever Jeroen thinks is most helpful). |
Great! This seems operationally-feasible at least. I agree that we can try to optimize as much as possible for R-universe. As for the actual snapshots we create as an archive. I wonder if it will be enough to include the source files, in the way CRAN does. Of course with the changes in tooling, this doesn't guarantee they'll build at some future point in time. But just for what we currently have in Community - the source files are 90MB whilst the binaries come in at a hefty 1.6GB. |
I think it makes sense to include binaries for mac and windows for |
Right that makes sense - and would bring the equivalent binaries size down to 520MB for the above case. @jeroen I also hope you're broadly ok with our suggestion above - which is that we would only need Staging for a month four times a year for the "staging phase" - and then we have a Production universe, which only updates 4 times a year (and no need to run checks between those times). |
I agree to limit binaries to r-release versions, but why limit binaries just what CRAN provides? It could provide the ubuntu binaries for r-release too (I think it would help a lot of docker and scientific people as r2u has showed). |
We could consider, but Ubuntu builds are only available for Noble on R-universe. I don't think we can guarantee that these will work on older versions. Also the default is for source builds on Linux. I think it's safe to say that users on Linux will be comfortable with compiling things from source, whereas there is not necessarily the same expectation on Windows, where |
One disadvantage of having a universe for production is the lack of guaranteed availability if a package is removed from GitHub/GitLab. I don't see this as an issue in the initial rollout, but later on we may want to switch to something else. Maybe as the effort gains traction and popularity, it will be easier to find a sustainable minimal-cost way to host Production as a snapshot. |
I actually consider this an advantage. Taking down a repo from GitHub / GitLab is a pretty clear decision that the author no longer wants the content published. Just as we track releases once a contribution has been made, we should also 'track' deletions. Of course we'd have to remove it if a request were made directly to us, according to our policy, but I consider this implicit in taking down the repo. Also in case it wasn't clear, the intention is still to take a snapshot of each release for archival purposes so we can have versioned releases. We can find the best way to make it available for download by corporates etc. but this will be an easier task than web hosting a CRANlike repo. |
Deletion can happen for careless reasons as well. And what if a repo is moved or renamed, e.g. accepted into https://github.com/ropensci, then replaced with different content? Maintainers may find it surprising that deleting or moving/replacing a development repo would have immediate downstream consequences for Production. Maintainers might not even understand what Production means, and they might not understand that packages in Production should stay available, e.g. for serious regulated environments, except for special exceptions. I agree that a universe may be our best option for Production in our initial rollout, but I would also like to keep thinking about switching to something more enduring later on. Doesn't have to be solved immediately.
In the long run, this seems like the right way to handle all deletions from Production. |
It may a bit harder to retrieve packages from an archive than from the current production, e.g. if an entire container needs to be downloaded. |
Another long-term concern about Production as a universe: if base R is updated during the year and some packages no longer build, Production would no longer be able to host them. And even if they do build, there may be failing R-universe checks on packages we insist on not updating. |
We should do the same as bioconductor, and plan for each version of our production snapshot which version of base R it will target, and make sure to plan things such that the r-multiverse-staging universe tests with that version of R. |
I'm onboard with both suggestions. Agree that Production probably needs to be handled a bit differently to Community. |
We've implemented this. |
In a meeting with @shikokuchuo and @jeroen today, @jeroen had a breakthrough model for production in R-multiverse. Here's how it goes:
Community
We keep the existing community universe at https://community.r-multiverse.org, where the latest releases of packages are guaranteed to be available. Uses
"branch": "*release"
inpackages.json
.Staging
We have a second universe at https://staging.r-multiverse.org where packages are staged for production but not necessarily in production yet. Staging uses
"branch": SPECIFIC_REMOTE_SHA
inpackages.json
to select specific versions of packages. Packages are automatically promoted from production to community based on lightweight automated checks on the package metadata, e.g. theDESCRIPTION
file has noRemotes:
packages and the version numbers of releases increase monotonically.Production
Production is not its own universe. Rather, it is a snapshot of the Staging universe. Quarterly snapshots of Staging will be downloaded from the R-universe snapshot API at https://staging.r-multiverse.org/apis and hosted from Netlify. These snapshots will include the sources and binaries of all the packages which pass
R CMD check
on R-release and R-oldrelease in the Staging universe. Users will be able to download packages from the latest snapshot usinginstall.packages(repos = "https://production.r-multiverse.org")
. Past snapshots will be archived in GitHub Container Registry.Phases
Since production will only update quarterly, maintainers will get few chances get packages into production. To increase the chance of success, we will institute a "freeze" phase a month before each snapshot gets published. During the freeze, R-multiverse automation will avoid promoting new package versions to Staging except for packages which are failing checks. In other words, packages that are already healthy are frozen. Maintainers who really need to update them anyway will be able to manually submit pull requests to https://github.com/r-multiverse/staging to update the
"branch"
field ofpackages.json
.Remarks
Implementation
multiverse.internals
:snapshot_include()
andsnapshot_include()
to list packages that should be included/excluded in the production snapshot.The text was updated successfully, but these errors were encountered: