Skip to content

[wip] Troubleshoot tests#68727

Open
dwoz wants to merge 1 commit intosaltstack:3007.xfrom
dwoz:troubleshoot
Open

[wip] Troubleshoot tests#68727
dwoz wants to merge 1 commit intosaltstack:3007.xfrom
dwoz:troubleshoot

Conversation

@dwoz
Copy link
Contributor

@dwoz dwoz commented Feb 12, 2026

What does this PR do?

What issues does this PR fix or reference?

Fixes

Previous Behavior

Remove this section if not relevant

New Behavior

Remove this section if not relevant

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

Commits signed with GPG?

Yes/No

@dwoz dwoz requested a review from a team as a code owner February 12, 2026 08:55
@dwoz dwoz changed the base branch from master to 3007.x February 12, 2026 08:56
@dwoz dwoz added the test:full Run the full test suite label Feb 12, 2026
Add defensive code to handle the case where state_queue.lock or
job_queue.lock exist as directories instead of files. This situation
has been observed in CI tests, though the root cause is unclear.

The fix:
- Changes the check from os.path.isfile() to os.path.exists()
- Handles three cases: file (expected), directory (unexpected), other
- Uses shutil.rmtree() to remove directories
- Adds warning logging when a directory is found to aid debugging
- Improves error handling with specific log messages

This allows the minion to recover from corrupted lock state and
continue operating, while logging information to help identify the
underlying cause.
Comment on lines +407 to +421
# Defensive check: Clean up corrupted lock files (e.g., directories from OverlayFS races)
if os.path.exists(lock_fn) and not os.path.isfile(lock_fn):
_raise_error(f"lock_fn {lock_fn} exists and is not a file")
log.warning(
"Lock %s exists but is not a file (%s), cleaning up. "
"This may indicate an OverlayFS race or filesystem issue.",
lock_fn,
"directory" if os.path.isdir(lock_fn) else "other",
)
try:
if os.path.isdir(lock_fn):
shutil.rmtree(lock_fn)
else:
os.remove(lock_fn)
except OSError as exc:
_raise_error(f"Failed to clean up corrupted lock {lock_fn}: {exc}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you leave this in after troubleshooting then I suggest refactoring into a function since this appears to be exactly the same as the code added above at 325-339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants