Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency filters for production #26

Merged
merged 17 commits into from
Jun 11, 2024
Merged

Conversation

wlandau
Copy link
Member

@wlandau wlandau commented Jun 10, 2024

@shikokuchuo, this PR flags packages whose upstream dependencies have issues. There is a new issues_dependencies() function called from record_versions(), and it works like this. As an example, let's take our mirai/crew ecosystem and add a package x with dependencies nanonext and mirai and reverse dependency crew.aws.batch:

Screenshot 2024-06-10 at 3 05 29 PM

If mirai has a check error, then issues_dependencies() flags everything that depends on it. Specifically, it returns a nested list:

list(
  crew = list(mirai = character(0L)),
  x = list(mirai = character(0L)),
  crew.aws.batch = list(mirai = c("crew", "x")),
  crew.cluster = list(mirai = "crew")
)

The list says:

  1. Packages crew, x, crew.aws.batch, and crew.cluster cannot stay in production because of the issue with mirai.
  2. The upstream package at the root of the problem is mirai.
  3. mirai impacts crew.aws.batch via packages crew and x.
  4. mirai impacts crew.cluster via package crew.

A downstream package can be impacted by more than one upstream one. For example, if both mirai and crew directly cause issues in some way, then issues_dependencies(packages = c("mirai", "crew"), ...) returns:

list(
  crew = list(mirai = character(0)),
  x = list(mirai = character(0)), 
  crew.aws.batch = list(
    mirai = c("crew", "x"),
    crew = character(0)
  ), 
  crew.cluster = list(
    mirai = "crew",
    crew = character(0)
  )
)

This choice of output format may seem complicated, but we can have downstream code interpret it (e.g. the RSS feed) and explain to the user that e.g. "there is an issue in package A which impacts your package B through strong direct dependencies C and D".

Also in this PR: wherever an issue is flagged in a package for any reason, the JSON file includes the version and remote SHA of the problematic release.

Lastly, an FYI: issues_dependencies() computes on a large dependency graph, and igraph is the best tool for the job. Unfortunately, it is quite a heavy dependency for multiverse.internals.

@wlandau wlandau requested a review from shikokuchuo June 10, 2024 19:14
@wlandau wlandau self-assigned this Jun 10, 2024
@shikokuchuo
Copy link
Member

I have a couple of updates to the tests, but it seems I can't push to the branch on your fork. I think we can probably branch off the repo directly in future.

@wlandau
Copy link
Member Author

wlandau commented Jun 10, 2024

Hmm... looks like your fork is at version 0.1.4, which is from 2 weeks ago: shikokuchuo@1c58a2a. I only see the main branch. What updates do you have for the tests? Would it be easy to re-implement?

@wlandau
Copy link
Member Author

wlandau commented Jun 10, 2024

Oh, you meant on my fork. Yeah, branching off the main repo would let us edit each other's PRs.

@shikokuchuo
Copy link
Member

what i mean is I've pulled this PR using the GH client. in the past it's let me push edits directly to the PR, but I think as this is on your fork it doesn't let me. Unless I'm not doing it correctly...

@shikokuchuo
Copy link
Member

Nothing major I was really just re-verifying the mock data vs. the real pulled data

Copy link
Member

@shikokuchuo shikokuchuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one suggestion. I can add my minor test updates later.

R/issues_dependencies.R Outdated Show resolved Hide resolved
@shikokuchuo
Copy link
Member

Just so I can understand it fully, if we add later to this dependency graph as upstream of nanonext, and it has a check error, what would issues_dependencies() return? Can we mock that in your test repo?

Copy link
Member

@shikokuchuo shikokuchuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just pending my question on what it looks like when the dependency graph is deeper than 3 nodes.
EDIT: I'm just adding crew to R-Multiverse to check . Hope you don't mind!

@wlandau
Copy link
Member Author

wlandau commented Jun 11, 2024

FYI I just did some profiling and pushed optimizations. Now, issues_dependencies(meta$package, meta) only takes 111.3 seconds on my machine, where meta is the output of meta_packages(repo = "https://cran.r-universe.dev"). In other words, it takes under 2 minutes to check all of CRAN in the case when every CRAN package has an issue. issues_dependencies("crew.aws.batch", meta) takes only 0.266 seconds (creating the dependency graph is now fast).

@shikokuchuo
Copy link
Member

Thanks. I now have the mental model for this. Looks good.

Can I suggest that you set the 'verbose' default to FALSE? Once the repo gets larger I think it will be too verbose to have this stuff in the log files.

@wlandau
Copy link
Member Author

wlandau commented Jun 11, 2024

Just so I can understand it fully, if we add later to this dependency graph as upstream of nanonext, and it has a check error, what would issues_dependencies() return? Can we mock that in your test repo?

Here's what the graph looks like:

Screenshot 2024-06-11 at 9 54 52 AM

And supposing there is an issue in later, issues_dependencies("later", meta) returns:

list(
  nanonext = list(later = character(0)),
  mirai = list(later = "nanonext"),
  crew = list(later = c("mirai", "nanonext")),
  crew.aws.batch = list(later = "crew"), 
  crew.cluster = list(later = "crew"),
)

Interpretation:

  • later causes an issue that cascades downstream to packages nanonext, mirai, crew, crew.aws.batch, and crew.cluster through the dependency graph.
  • nanonext directly depends on later through its DESCRIPTION file.
  • mirai indirectly depends on later through nanonext, and nanonext is a strong direct dependency.
  • crew indirectly depends on later through strong direct dependencies mirai and nanonext.
  • crew.aws.batch indirectly depends on later through strong direct dependency crew.
  • crew.cluster indirectly depends on later through strong direct dependency crew.

@wlandau
Copy link
Member Author

wlandau commented Jun 11, 2024

Can I suggest that you set the 'verbose' default to FALSE? Once the repo gets larger I think it will be too verbose to have this stuff in the log files.

Done in e41b66d, though we may need it for log files if we notice too much slowness for some reason.

@wlandau wlandau merged commit 1cd24e5 into r-multiverse:main Jun 11, 2024
6 checks passed
@wlandau wlandau deleted the revdep branch June 11, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants