Fix #634: Clean up pygit2 Repository objects to prevent symlink accumulation in /proc#885
Closed
tboy1337 wants to merge 2 commits intopermitio:masterfrom
Closed
Fix #634: Clean up pygit2 Repository objects to prevent symlink accumulation in /proc#885tboy1337 wants to merge 2 commits intopermitio:masterfrom
tboy1337 wants to merge 2 commits intopermitio:masterfrom
Conversation
- Added error handling for fetch and clone operations to log failures and clean up cached repositories. - Introduced a new method `_cleanup_repo_from_cache` to release file descriptors held by pygit2 Repository objects during failures. - Ensured that partial clones are cleaned up to prevent resource leaks.
- Changed early exit to raise an exception in the GitPolicyFetcher to ensure proper error propagation. - Updated test to assert that the correct exception is raised during fetch operations, improving test coverage for error scenarios.
✅ Deploy Preview for opal-docs canceled.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
/claim #634
Fixes Issue
Closes #634
Problem
When OPAL Server has GitHub policy sources configured and GitHub becomes unavailable (returns 500 errors),
pygit2.Repositoryobjects remain in the class-levelGitPolicyFetcher.reposcache. These objects hold native C-level file descriptors that appear as symbolic links in/proc/{pid}/fd/, making the server look like it has spawned zombie processes. Over time, this leads to file descriptor exhaustion.Root Cause
The existing code never explicitly releases
pygit2.Repositoryobjects when Git operations fail. Simply deleting references from the cache dictionary is insufficient because:Solution
Added a
_cleanup_repo_from_cache()method that:GitPolicyFetcher.reposcacherepo.free()to immediately release native C resourcesApplied cleanup at 4 critical failure points:
pygit2.GitErrorshutil.rmtree(ignore_errors=True)_get_valid_repo()shutil.rmtree()when deleting invalid reposKey Changes
File:
packages/opal-server/opal_server/git_fetcher.py_cleanup_repo_from_cache()method with explicitrepo.free()call_get_valid_repo()for corrupted repositoriesFile:
packages/opal-server/opal_server/tests/test_git_fetcher_cleanup.pyfree()call verificationfree()Demo Video
[TO BE RECORDED: The video will demonstrate:]
/proc/{pid}/fdshowing stable symlink count (no accumulation)Why This Approach
repo.free()immediately releases native C resources instead of relying on garbage collectionTesting
All 7 tests verify:
repo.free()is called to release file descriptorsRunning tests:
Verification
To verify the fix in production:
len(GitPolicyFetcher.repos)should not grow after failures/procsymlinks (Linux):ls -la /proc/{pid}/fd | wc -lshould remain stableCheck List
Related PRs
This PR builds on insights from previous attempts (#850, #855, #864, #881) and combines the best aspects: explicit resource cleanup (
repo.free()), comprehensive error handling, and thorough test coverage.