Skip to content

Fix parallel_run fork deadlock#2

Open
lolyu wants to merge 3 commits intomasterfrom
fix_dualtor_loganalyzer_deadlock2
Open

Fix parallel_run fork deadlock#2
lolyu wants to merge 3 commits intomasterfrom
fix_dualtor_loganalyzer_deadlock2

Conversation

@lolyu
Copy link
Owner

@lolyu lolyu commented Feb 2, 2026

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@greptile-apps
Copy link

greptile-apps bot commented Feb 2, 2026

Greptile Overview

Greptile Summary

This PR fixes a critical fork deadlock issue in parallel_run that occurs when child processes inherit locked logging handlers from the parent process during fork operations.

Key Changes:

  • Implements os.register_at_fork() handlers to acquire/release logging locks before and after fork operations, ensuring child processes start with unlocked handlers
  • Adds _fix_logging_handler_fork_lock() function that dynamically registers fork handlers for all existing logging handlers, preventing deadlock scenarios
  • Patches Ansible WorkerProcess.start() method to properly manage stdin file descriptors and avoid additional fork-related issues
  • Applies the Ansible patch at module initialization in conftest.py to ensure all test processes benefit from the fix

Implementation Details:
The fix follows the pattern from python-atfork library, registering callbacks that:

  1. Acquire all logging locks before fork (in parent)
  2. Release locks in parent after fork
  3. Release locks in child after fork

This ensures child processes never inherit locked mutexes, which would cause deadlock when the child attempts to log (since the lock owner thread doesn't exist in the child process).

Testing Recommendations:

  • Verify no deadlocks occur in parallel test execution with heavy logging
  • Check that logging functionality works correctly in both parent and child processes
  • Ensure Ansible operations complete successfully without stdin-related hangs

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - addresses a critical concurrency bug using established patterns
  • Score reflects a well-implemented fix for a known Python multiprocessing issue. The implementation follows established patterns from python-atfork library. Minor style improvement suggested but doesn't affect correctness. The fix targets a real deadlock scenario that can occur in production
  • No files require special attention - both changes are straightforward fixes

Important Files Changed

Filename Overview
tests/common/helpers/parallel.py Implements comprehensive fork-safe logging by registering at_fork handlers to acquire/release locks, preventing deadlocks when child processes inherit locked handlers from parent during fork
tests/conftest.py Applies Ansible WorkerProcess patch at module initialization to prevent stdin-related deadlocks in worker processes

Sequence Diagram

sequenceDiagram
    participant Main as Main Thread
    participant Logger as Logging Module
    participant Handler as Log Handler
    participant Fork as Fork Operation
    participant Child as Child Process
    
    Note over Main,Child: Before Fix (Deadlock Scenario)
    Main->>Logger: Write log (acquires handler lock)
    activate Handler
    Main->>Fork: fork() called
    Note over Fork: Lock state copied to child
    Fork->>Child: Child process created
    Note over Child: Child inherits locked handler
    Child->>Handler: Attempt to log
    Note over Child: DEADLOCK: Lock already held
    deactivate Handler
    
    Note over Main,Child: After Fix (Safe Fork)
    Main->>Logger: _fix_logging_handler_fork_lock()
    Logger->>Handler: Register at_fork handlers
    Note over Handler: before=lock.acquire<br/>after_in_parent=lock.release<br/>after_in_child=lock.release
    Main->>Fork: fork() called
    Fork->>Handler: Execute before fork (acquire lock)
    activate Handler
    Fork->>Child: Child process created
    Fork->>Handler: Execute after_in_parent (release lock)
    deactivate Handler
    Fork->>Child: Execute after_in_child (release lock)
    Child->>Handler: Attempt to log
    Note over Child: SUCCESS: Lock is released
    Handler-->>Child: Log written successfully
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

if hasattr(logger, 'handlers'):
handlers.update(logger.handlers)
for handler in handlers:
new_handlers = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_handlers list is created inside the lock but only used outside - move the new_handlers = [] declaration outside the for handler in handlers: loop to avoid unnecessary recreations

Suggested change
new_handlers = []
new_handlers = []
for handler in handlers:
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/common/helpers/parallel.py
Line: 53:53

Comment:
`new_handlers` list is created inside the lock but only used outside - move the `new_handlers = []` declaration outside the `for handler in handlers:` loop to avoid unnecessary recreations

```suggestion
    new_handlers = []
    for handler in handlers:
```

How can I resolve this? If you propose a fix, please make it concise.

lolyu added 2 commits February 2, 2026 12:08
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant