Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package version best practices #21

Closed
wlandau opened this issue Mar 4, 2024 · 17 comments
Closed

Package version best practices #21

wlandau opened this issue Mar 4, 2024 · 17 comments
Assignees

Comments

@wlandau
Copy link
Member

wlandau commented Mar 4, 2024

In r-universe-org/help#363, @gmbecker mentioned it is important that users be able to trust the version number of a package. If a new release of a package is published, then its version number should always increment.

It is straightforward to list all the version numbers and MD5 hashes of all the packages hosted at https://r-releases.r-universe.dev.

suppressPackageStartupMessages(library(dplyr))
library(tibble)
available.packages(repos = "https://r-releases.r-universe.dev") %>%
  as_tibble() %>%
  rename(hash = MD5sum) %>%
  rename_with(tolower) %>%
  select(all_of(c("package", "version", "hash")))
#> # A tibble: 113 × 3
#>    package   version hash                            
#>    <chr>     <chr>   <chr>                           
#>  1 BaseSet   0.9.0   027bcf49db2279a0f13170f9149a2a1c
#>  2 BioCor    1.7.0   a17637da3164d60c1df3563386cc8d6c
#>  3 Matrix    1.6-5   ab11095f57536c212af541bb81321105
#>  4 R6        2.5.0   d03c26e0f56c0406976a9b0d4744b11a
#>  5 Rcpp      1.0.12  31c18d8690068e80f36738a59b6f3a38
#>  6 askpass   1.2.0   2a75324b61f52f7dabd6ce939ce22af7
#>  7 base64enc 0.1-3   c5ff8d23d40bcd542cd5dc2548d5d756
#>  8 bit       4.0.5   ef05f46d4c2a1edfa35ef3d78c6eab74
#>  9 bit64     4.0.5   e48a9678044d06337be9d10534ef273a
#> 10 brew      1.0-10  c2282974f7d74ce3100494b619cd023f
#> # ℹ 103 more rows

Created on 2024-03-04 with reprex v2.1.0

As part of https://github.com/r-releases/r-releases.r-universe.dev/blob/main/.github/workflows/build_universe.yaml, we could pull this information and cache it as a new file in https://github.com/r-releases/r-releases.r-universe.dev. During the caching process, we could compare the current versions/hashes to the previously versions/hashes and make a judgement about version compliance. Then in #6 or #10, we could use this information to recommend which packages are safe to install.

@wlandau wlandau self-assigned this Mar 4, 2024
@wlandau
Copy link
Member Author

wlandau commented Mar 4, 2024

I will implement this, hopefully this week or next week, as I think the r.releases.utils package is the place for it.

@wlandau
Copy link
Member Author

wlandau commented Mar 4, 2024

Actually, maybe this exists on its own as another function in the same package as #6, but not as part of install_safe().

@wlandau
Copy link
Member Author

wlandau commented Mar 4, 2024

The manifest should include:

  1. The current release version.
  2. The highest version ever released.
  3. The current release MD5 sum.
  4. The MD5 sum of the highest version ever released.

Then a separate function in the package from #6 could pull the manifest file and decide everything it needs to know from there. For good versioning practices, (1) and (2) should agree, and (3) and (4) should agree.

@wlandau
Copy link
Member Author

wlandau commented Mar 4, 2024

(3) and (4) could be obtained from the previous manifest on each iteration.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

To make #6 easier, I will write a separate JSON with just the package listings with version issues. This should be a small enough list for releases::check_releases() (or whatever we call that function) to download the whole thing quickly.

@shikokuchuo
Copy link
Member

Btw. @wlandau just a note that you might need to use the remote sha of the Github commit (also returned by the R-universe API) as if R-universe is constantly re-building on a periodic basis, then the MD5 sum of each package will presumably differ due to the date (metadata contained in the package).

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

Do you know how to get those GitHub SHAs from the R-universe API? Unfortunately available.packages() always returns me NAs for the RemoteSha field, which is the only reason I first used MD5s.

@shikokuchuo
Copy link
Member

Oh I see, yes that would be the MD5 sum of the built package as I understand it. So each build could return a different one.

The R-universe API is the one for each package like: https://r-releases.r-universe.dev/api/packages/mirai
It returns one combined json payload, which would have to be parsed for the 'RemoteSha'.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

I was afraid these would have to be pulled one-by-one. nanonext::ncurl("https://cran.r-universe.dev/api/packages") gets multiple packages, but only a small subset of 315. I wonder if some kind of pagination or other workaround is possible here.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

I tried hitting the API for each package, but I stopped it at 20 packages because it was clear the speed would not scale for our purposes.

So for the development of r-releases, I propose that for now, we only flag versions the decrement, as oppose to ones that release without incrementing. We don't need the hash for that part. This way, we can at least provide something that #6 can build on.

I will also see if RemoteSha can be added to the DESCRIPTION of built packages in R-universe.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

So for the development of r-releases, I propose that for now, we only flag versions the decrement, as oppose to ones that release without incrementing. We don't need the hash for that part. This way, we can at least provide something that #6 can build on.

And with that, r-multiverse/multiverse.internals#9 and r-multiverse/community#6 are now ready for review.

@shikokuchuo
Copy link
Member

shikokuchuo commented Mar 5, 2024

ncurl_aio() might be a better choice here whereby you could try getting say 100 concurrently. I'm not sure if that would help - the individual json payloads would still be quite large. But sequential downloads would be too slow due to the network latency - this overrides any other factor.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

We might get results faster, but I worry this may overburden the API. I will ask Jeroen.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

Just submitted r-universe-org/help#377. I would prefer to decide on r-multiverse/multiverse.internals#9 and r-multiverse/community#6 based on the discussion in that thread.

@wlandau
Copy link
Member Author

wlandau commented Mar 5, 2024

I think including RemoteSha would have to be included in https://r-releases.r-universe.dev/src/contrib/PACKAGES and/or https://r-releases.r-universe.dev/src/contrib, and then available.packages() should work.

@wlandau
Copy link
Member Author

wlandau commented Mar 6, 2024

Thanks to @jeroen's work on r-universe-org/help#377, we are now able to reliably get the RemoteSha field. I opened a pull request at r-multiverse/multiverse.internals#11.

@wlandau
Copy link
Member Author

wlandau commented Mar 6, 2024

Solved by r-multiverse/multiverse.internals#11 and and especially r-universe-org/help#377.

@wlandau wlandau closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants