Skip to content

Conversation

analytically
Copy link
Contributor

When checking for new versions in large repositories, the git-resource currently uses --filter=blob:none unconditionally. However, when path filtering is not required, we can use the more aggressive --filter=tree:0 for significant performance improvements.

Changes:

  • Use --filter=tree:0 when no path filtering is configured (paths="." and no ignore_paths)
  • Continue using --filter=blob:none when path filtering is active, as git rev-list needs tree objects to evaluate path specifications

…ements

When checking for new versions in large repositories, the git-resource
currently uses --filter=blob:none unconditionally. However, when path
filtering is not required, we can use the more aggressive --filter=tree:0
for significant performance improvements.

Changes:
- Use --filter=tree:0 when no path filtering is configured (paths="." and
  no ignore_paths)
- Continue using --filter=blob:none when path filtering is active, as
  git rev-list needs tree objects to evaluate path specifications

Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com>
@analytically analytically requested a review from a team as a code owner September 19, 2025 16:05
@analytically
Copy link
Contributor Author

The performance improvement is significant:

Concourse repo:
Clone time: 42% faster (1.466s → 0.848s)
Disk usage: 54% smaller (21MB → 9.6MB)

Kubernetes repo:
Clone time: 77% faster (26.953s → 6.277s) - Over 4x speedup!
Disk usage: 72% smaller (204MB → 57MB)

@analytically
Copy link
Contributor Author

analytically commented Sep 19, 2025

This optimization automatically helps subsequent fetches too! Once a repo is cloned with --filter=tree:0, all future fetches during checks will also skip trees.

@aliculPix4D
Copy link

aliculPix4D commented Sep 23, 2025

Just curious: what are the benefits over the git ls-remote approach implemented in:
#425 (comment)
(I assume you are aware of it.)

At first glance, I would say that your approach still works with tag_filter/tag_regexp compared to the git ls-remote approach?

In the last weeks, in our Concourse deployment, we have added the disable_ci_skip: true to all our git resources to benefit from git ls-remote super-fast checks.

This optimization automatically helps subsequent fetches too

but this is true only for 1h? the git check containers have a maximum lifetime of 1h, if I am not mistaken,

@analytically
Copy link
Contributor Author

I wasn't aware. Benefits are that this isn't a breaking change as it doesn't need disable_ci_skip set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

2 participants