Skip to content

Add proper subprocess cleanup on application shutdown#46

Open
SerGem811 wants to merge 1 commit intochutesai:mainfrom
SerGem811:feat/subprocess-cleanup-on-shutdown
Open

Add proper subprocess cleanup on application shutdown#46
SerGem811 wants to merge 1 commit intochutesai:mainfrom
SerGem811:feat/subprocess-cleanup-on-shutdown

Conversation

@SerGem811
Copy link

Fix: Add Proper Subprocess Cleanup on Application Shutdown

Why This Is Important

When running in Docker containers, subprocesses (SGLang, vLLM, SSH server) spawned by the application were not being properly cleaned up on shutdown, leading to several critical issues:

  1. Zombie Processes: Subprocesses remain running after container shutdown, consuming system resources
  2. Port Conflicts: Processes holding ports prevent clean container restarts
  3. Resource Leaks: Memory and file descriptors not released properly
  4. Unclean Shutdowns: Container orchestration systems (Kubernetes, Docker Swarm) may experience issues with containers that don't shut down cleanly

Docker sends SIGTERM signals when stopping containers. Without proper cleanup handlers, subprocesses continue running, causing resource leaks and potential conflicts when containers are restarted or scaled.

What Is Fixed

1. SGLang Subprocess Cleanup (chutes/chute/template/sglang.py)

  • Added @chute.on_event("shutdown") hook to gracefully terminate SGLang subprocess
  • Cancels the associated monitor task to prevent resource leaks
  • Implements graceful shutdown with 5-second timeout, falling back to force kill if needed
  • Handles edge cases (process already terminated, missing attributes)

2. vLLM Subprocess Cleanup (chutes/chute/template/vllm.py)

  • Added @chute.on_event("shutdown") hook to gracefully terminate vLLM subprocess
  • Cancels the associated monitor task
  • Same graceful shutdown pattern with timeout and force kill fallback

3. vLLM Embedding Subprocess Cleanup (chutes/chute/template/embedding.py)

  • Added cleanup hook for embedding template subprocesses
  • Ensures consistent cleanup behavior across all vLLM-based templates

4. SSH Process Reference Return (chutes/entrypoint/ssh.py)

  • Modified setup_ssh_access() function to return the SSH subprocess reference
  • Enables proper tracking and cleanup (cleanup already handled in job monitoring code)
  • Improves process lifecycle management

Technical Details

  • Shutdown Hook Registration: Uses FastAPI's @chute.on_event("shutdown") decorator to register cleanup handlers
  • Graceful Shutdown: Attempts terminate() first, waits up to 5 seconds for graceful shutdown
  • Force Kill Fallback: If process doesn't terminate within timeout, uses kill() to force termination
  • Task Cancellation: Properly cancels associated asyncio monitor tasks to prevent resource leaks
  • Error Handling: Comprehensive error handling for edge cases (ProcessLookupError, missing attributes)
  • Logging: Added informative logging for debugging shutdown behavior

Files Changed

chutes/chute/template/sglang.py       - Added subprocess import and cleanup hook
chutes/chute/template/vllm.py         - Added cleanup hook
chutes/chute/template/embedding.py    - Added cleanup hook  
chutes/entrypoint/ssh.py              - Modified to return process reference

Code Changes Summary

sglang.py

  • Added import subprocess at module level
  • Added cleanup_sglang() method with @chute.on_event("shutdown") decorator
  • Implements graceful termination with timeout handling

vllm.py

  • Added cleanup_vllm() method with @chute.on_event("shutdown") decorator
  • Same cleanup pattern as SGLang for consistency

embedding.py

  • Added cleanup_vllm() method with @chute.on_event("shutdown") decorator
  • Ensures embedding subprocesses are also cleaned up

ssh.py

  • Modified setup_ssh_access() to return sshd_process reference
  • Enables better process lifecycle management

Impact

  • ✅ Prevents zombie processes in Docker containers
  • ✅ Ensures clean container shutdowns for orchestration systems
  • ✅ Prevents port conflicts on container restart
  • ✅ Reduces resource leaks and improves system stability
  • ✅ Better process lifecycle management

@SerGem811
Copy link
Author

Hi @jondurbin
Kindly check this PR, i tried to add feature to properly shutdown subprocess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments