Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow on repos with large numbers of PRs #48

Closed
aroon-color opened this issue Dec 17, 2024 · 5 comments
Closed

Very slow on repos with large numbers of PRs #48

aroon-color opened this issue Dec 17, 2024 · 5 comments

Comments

@aroon-color
Copy link

aroon-color commented Dec 17, 2024

We use a (mostly) mono-repo setup with many developers creating branches and submitting PRs to main. At the moment we have about 120 PRs and 2400 branches (a small number are "perennial" in the git-sync parlance but most are not). Historically though we've had on the order of 100,000 PRs opened - and this seems to be the issue. It looks like the action is enumerating all PRs ever made into a repository. It takes the action ~15 minutes to update the PR description, which is longer than our other CI tasks take.

I think adding a look back limit with some sane defaults (last 100?) on the PR enumeration would help a lot (adding sort: desc; page: 1 to getPullRequests instead of paginating).

@aroon-color aroon-color changed the title Very slow on repos with large numbers of branches or commits Very slow on repos with large numbers of PRs Dec 17, 2024
@tranhl
Copy link
Collaborator

tranhl commented Dec 17, 2024

@aroon-color Thanks for reporting this! I was curious how the action would perform for large repos so this is perfect feedback.

Yeah, enumerating all PRs/branches is certainly not optimal! The reason why is if we don't, the stack graph may not render properly due to the required PRs/branches in the stack being missing from fetched repo data.

15 minutes sounds like an incredibly long time though. Each page is 100 items, so a request taking ~1s should only take 24 seconds to fetch 2400 branches. Sounds like we're being rate limited by the GitHub API? Is the action consistently taking 15 minutes, or was that happening when there were 100k PRs open?

Either way, I'll look into options for reducing the dependence on the GitHub API. In the meantime if there's any more information you come across I'd love to know about it! Thanks again for reporting this in! 🙌

@kevgo
Copy link
Contributor

kevgo commented Dec 17, 2024

That's a great problem to have! What do you think about these ideas to reduce the number of PRs that are looked at? I don't have too much background on how this action works internally, so I might be missing something obvious :)

  • Limit PR history to a configurable timeframe: For example, only consider PRs that were closed within the last 6 months. This threshold could be user-configurable. While some stacks may require a longer window, I suspect most users would be fine if the extension didn't handle such edge cases. If an older PR is encountered, the tool could print a notification to inform the user.

  • Early exit when enough data is collected: Could the action stop traversing PRs once it has all the necessary information? For instance, once it determines all ancestor branches up to the main branch (or a perennial branch), it should have everything it needs, no? AFAIK all relevant descendants should have open PRs.

@tranhl
Copy link
Collaborator

tranhl commented Dec 18, 2024

Ah my bad I misread the problem! There are 120 open PRs, but 100k historical ones.

@kevgo The action needs to visualize both ancestor & child PRs, so early exiting wouldn't be possible without prior knowledge of what the nodes in the stack graph are. This is a GitHub limitation unfortunately. In this case, the only way to guarantee the graph is accurate is to load all PRs (open or closed). I'd prefer not to add another configuration option if possible, but I can't think of a better solution at the moment!

@aroon-color I've added a history-limit (#49) input that will allow you to limit the number of closed pull requests fetched from the GitHub API. You can preview this change by creating a pull request that pins the action to the commit hash (1072a102986de399e9482cb794dc67d3531a2d59):

    steps:
      - name: Git Town
-       uses: git-town/action@v1
+       uses: git-town/action@1072a102986de399e9482cb794dc67d3531a2d59
+       inputs:
+         history-limit: '500' # Tweak as necessary

Let me know if this helps with your use case! If it does I'll ship out the change ASAP.

@aroon-color
Copy link
Author

That makes it incredibly fast, down to between 10-15 seconds depending on the stack depth (2-10) from 15 minutes! Thanks for being so responsive 🥳

@tranhl
Copy link
Collaborator

tranhl commented Dec 18, 2024

Awesome!! This has been shipped in v1.0.8. This doesn't require any changes from your end as long as you're using the v1 version tag. Thanks again for reporting this issue! 🚀

@tranhl tranhl closed this as completed Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants