Skip to content

Conversation

@dmartinol
Copy link
Collaborator

@dmartinol dmartinol commented Oct 20, 2025

Fixes #2250

Root cause

  • /tmp is not writable and the clone fails
  • Also, even after mounting an emptyDir to the deployment, the clone requires large memory that may exceed the limits causing the controller restart
  • Clone was performed twice because the FetchRegistry also called CurrentHash which runs another clone

Proposed solution

- Using sparse checkout to reduce the local repo size:
~ - Note: this also copies the other files and the subfloders from the registry data folder~

  • Use in-memory FS (thanks @JAORMX and @jhrozek ) for the suggestion
  • Do not clone again the repo to compute the hash
  • Added GC options in deployment to expedite the execution of the Go Garbage Collector

Alternatives

  • If the memory usage remains too high, we can use a direct HTTP raw fetch of the file with no storage and no git protocol
    • The raw URL depends on the Git provider and requires custom implementation. Some providers or versions may not be supported.
  • Connect a PVC with some storage area
    • Caching and reuse of existing local repos would improve performance (git fetch instead of full clone)

@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 73.49398% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.40%. Comparing base (977eabf) to head (2da6e58).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/git/fs.go 53.84% 33 Missing and 9 partials ⚠️
cmd/thv-operator/pkg/sources/git.go 95.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2263      +/-   ##
==========================================
+ Coverage   54.30%   54.40%   +0.10%     
==========================================
  Files         240      241       +1     
  Lines       23594    23695     +101     
==========================================
+ Hits        12812    12891      +79     
- Misses       9567     9587      +20     
- Partials     1215     1217       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes Git repository operations by implementing sparse checkout to reduce memory usage and improve performance. The root cause was that /tmp was not writable in Kubernetes deployments and clones required excessive memory, causing controller restarts. Additionally, the repository was being cloned twice unnecessarily.

Key changes:

  • Replaced multi-step clone operations with a single FetchFileSparse method using sparse checkout
  • Added configurable workspace directory via WORKSPACE_DIR environment variable
  • Mounted emptyDir volume at /workspace in Kubernetes deployments

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
deploy/charts/operator/templates/deployment.yaml Added WORKSPACE_DIR environment variable and mounted workspace volume
cmd/thv-operator/pkg/sources/git.go Refactored to use sparse checkout and eliminated duplicate cloning
cmd/thv-operator/pkg/sources/git_test.go Updated tests to use new FetchFileSparse API
cmd/thv-operator/pkg/git/client.go Implemented sparse checkout with path traversal protection
cmd/thv-operator/pkg/git/integration_test.go Updated integration tests for new API with improved test coverage
cmd/thv-operator/pkg/git/client_test.go Added comprehensive unit tests including security tests
cmd/thv-operator/pkg/git/e2e_test.go Added E2E tests for real repository operations
cmd/thv-operator/pkg/git/doc.go Updated documentation to reflect sparse checkout implementation
cmd/thv-operator/pkg/git/commit_test.go Removed obsolete commit-specific tests

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@JAORMX
Copy link
Collaborator

JAORMX commented Oct 21, 2025

@claude can you review this with a focus on security?

@claude
Copy link
Contributor

claude bot commented Oct 21, 2025

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

@dmartinol dmartinol changed the title fix: fix: Cannot deploy MCPRegistry from git in OpenShift Oct 21, 2025
@dmartinol dmartinol marked this pull request as draft October 22, 2025 06:26
@dmartinol dmartinol marked this pull request as ready for review October 22, 2025 09:20
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better! A couple of findings inline

storerCache := cache.NewObjectLRUDefault()
storer := filesystem.NewStorage(storerFs, storerCache)

repo, err := git.Clone(storer, memFS, cloneOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went from git.PlainCloneContext to git.Clone (no context), could we use git.CloneContext ?

repoInfo.Repository = nil

// 5. Force GC to reclaim memory
runtime.GC()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think nudging the GC is necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this call, the Pod still crashes with OOMKilled error.
Looks like it's really needed.
I would postpone this finalization once the logic moves to the thv-registry-api, since the same issue will be there, WDYT?

…client

Using configurable workspacedir to checkout the repo

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
- more logging
- increased memory size

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
…y FS

- added env variables to expedite the GC and avoid out of memory errors

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
chart version bump
run task operator-manifests

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot deploy MCPRegistry from git in OpenShift

3 participants