-
Notifications
You must be signed in to change notification settings - Fork 263
Fix #634: Robust cleanup of symlinks on fetch failure + Regression Test #855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for opal-docs canceled.
|
git_fetcher.py.-.opal.-.Visual.Studio.Code.2025-12-23.23-16-58.mp4Here is a short demo video verifying the fix. I ran a simulation script ( Note: Recorded on Windows, verifying that the fix works robustly even in environments with symlink restrictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments:
- You except a very broad exception, no matter which step failed you still do the entire checks for existence, deletion, etc. a better approach is to extract the lower level part of the code that needs to be "atomic" to a function and try except in the function for the specific exceptions and do a more dedicated "rollback" behavior.
- I'm not sure that I agree with entirely deleting the repo directory, when scale increases and repo grows, cloning the repo from zero can be a very expensive operation. I'm not familiar with the corrupted state that is left, but I'd say a better approach would be to retry ( wouldn't it just work on the next fetch? ) or try to fix the corrupted state instead of deleting the repo directory which the process worked so hard to clone
Perhaps if you'll explain more on the corrupted state the git repo stays in I'll understand better the issue and can help with thinking about a solution
|
Thanks for the thoughtful review, @omer9564! I see your point about broad exceptions and the high cost of deleting large repos. I’m going to refactor the logic to:
|
|
Hi @omer9564, I have refactored the implementation to address your feedback: Atomic Extraction: The core sync logic is now isolated in _attempt_atomic_sync. This cleanly separates the primary 'work' from the recovery logic. Soft Recovery: Instead of a full shutil.rmtree, the fetcher now performs a 'soft cleanup' by targeting stale Git lock files (index.lock, shallow.lock, etc.). Scalability: This approach fixes the corrupted state without forcing an expensive full re-clone, preserving data and performance for large repositories. Narrowed Exceptions: I narrowed the exception handling to catch specific pygit2 and subprocess errors to trigger this dedicated recovery path. Demo: Attached is a video demonstrating the soft recovery. I manually created an index.lock in the .git folder; the fetcher successfully identified and removed the lock while keeping the repository directory intact. git_fetcher.py.-.opal.-.Visual.Studio.Code.2025-12-29.10-23-19.mp4 |
02f64dd to
d6729be
Compare
|
Hey @Kesavaraja67, thanks for the quickly done fixes for my review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses Issue #634 by implementing robust cleanup of repository paths when fetch_and_notify_on_changes fails due to network or Git errors. The solution prevents corrupted symlinks and directories from blocking subsequent fetch operations.
Key changes:
- Added comprehensive error handling with targeted cleanup strategies for different failure scenarios
- Implemented soft cleanup for recoverable errors (stale locks, broken symlinks) and full cleanup as a last resort
- Added regression test to verify cleanup behavior on fetch failures
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| packages/opal-server/opal_server/git_fetcher.py | Refactored fetch logic into atomic sync function with multi-tier error handling and cleanup strategies |
| packages/opal-server/opal_server/tests/test_git_fetcher_cleanup.py | Added regression test verifying repository cleanup on simulated network failures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from pathlib import Path | ||
|
|
||
| # Ensure we can import the server package relative to this test file | ||
| current_dir = Path(__file__).parent | ||
| server_package_path = current_dir.parent.parent | ||
| sys.path.insert(0, str(server_package_path)) | ||
|
|
||
| from opal_server.git_fetcher import GitPolicyFetcher | ||
|
|
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual sys.path manipulation in tests suggests the package structure may not follow standard Python conventions. Tests should be able to import the package without path modifications. Consider restructuring the package to use proper relative imports or installing it in editable mode for testing.
| from pathlib import Path | |
| # Ensure we can import the server package relative to this test file | |
| current_dir = Path(__file__).parent | |
| server_package_path = current_dir.parent.parent | |
| sys.path.insert(0, str(server_package_path)) | |
| from opal_server.git_fetcher import GitPolicyFetcher | |
| from ..git_fetcher import GitPolicyFetcher |
| fetcher = MagicMock(spec=GitPolicyFetcher) | ||
| fetcher._repo_path = fake_repo_path | ||
|
|
||
| # Mock the internal cache to ensure we test the dictionary cleanup too | ||
| GitPolicyFetcher.repos = {str(fake_repo_path): "stale_object"} |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't actually exercise the code changes in git_fetcher.py. It manually replicates the cleanup logic instead of testing the actual fetch_and_notify_on_changes method. The test should mock the git operations to fail and verify that the actual error handling code performs the cleanup correctly.
| await self._notify_on_changes(repo) | ||
| return | ||
| else: | ||
| raise pygit2.GitError("Invalid repository metadata") |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message "Invalid repository metadata" is vague. Include the repository path in the error message to aid debugging, such as f"Invalid repository metadata at {repo_path}".
| raise pygit2.GitError("Invalid repository metadata") | |
| raise pygit2.GitError(f"Invalid repository metadata at {repo_path}") |
| if self._repo_path.exists() or os.path.islink(self._repo_path): | ||
| if self._repo_path.is_symlink(): | ||
| self._repo_path.unlink() | ||
| else: | ||
| # repo dir exists but invalid -> we must delete the directory | ||
| logger.warning( | ||
| "Deleting invalid repo: {path}", path=self._repo_path | ||
| ) | ||
| shutil.rmtree(self._repo_path) | ||
| else: | ||
| logger.info("Repo not found at {path}", path=self._repo_path) | ||
| shutil.rmtree(self._repo_path, ignore_errors=True) |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition self._repo_path.exists() or os.path.islink(self._repo_path) is checked before calling is_symlink(), but is_symlink() should be checked first since broken symlinks return False for exists(). Restructure to check is_symlink() in the outer condition to properly handle broken symlinks.
| if self._repo_path.exists() or os.path.islink(self._repo_path): | |
| if self._repo_path.is_symlink(): | |
| self._repo_path.unlink() | |
| else: | |
| # repo dir exists but invalid -> we must delete the directory | |
| logger.warning( | |
| "Deleting invalid repo: {path}", path=self._repo_path | |
| ) | |
| shutil.rmtree(self._repo_path) | |
| else: | |
| logger.info("Repo not found at {path}", path=self._repo_path) | |
| shutil.rmtree(self._repo_path, ignore_errors=True) | |
| if self._repo_path.is_symlink(): | |
| self._repo_path.unlink() | |
| elif self._repo_path.exists() or os.path.islink(self._repo_path): | |
| shutil.rmtree(self._repo_path, ignore_errors=True) |
| if self._repo_path.is_symlink(): | ||
| self._repo_path.unlink() | ||
| else: |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cleanup logic duplicates the symlink handling already present in _perform_soft_cleanup. Consider calling _perform_soft_cleanup here followed by shutil.rmtree for any remaining directories, to avoid code duplication and ensure consistent cleanup behavior.
| if self._repo_path.is_symlink(): | |
| self._repo_path.unlink() | |
| else: | |
| # Reuse soft cleanup for consistent symlink and lock-file handling | |
| self._perform_soft_cleanup(self._repo_path) | |
| # Remove any remaining directory tree as a last resort | |
| if self._repo_path.exists(): |
Fixes #634
Problem
When
fetch_and_notify_on_changesfails (e.g., network timeout), the repository path (directory or symbolic link) is left in a corrupted state, causing subsequent fetches to fail indefinitely.Solution
try...exceptblock that explicitly handles bothos.unlink(for symlinks) andshutil.rmtree(for directories).GitPolicyFetcher.reposcache is cleared on failure.packages/opal-server/opal_server/tests/test_git_fetcher_cleanup.py) that mocks the failure and verifies filesystem cleanup. (Includes a Windows compatibility check for local dev).Verification
I have added a new test file. Run it with:
pytest packages/opal-server/opal_server/tests/test_git_fetcher_cleanup.py/claim #634