Cause: No task-level heartbeat or execution timeout policy.
Analysis: Long tasks can stall pools and mask failures.
Acceptance Criteria:
- Task execution times out or emits heartbeat signals.
- Stuck tasks are detected and retried or failed safely.
Solution Approach:
- Add per-task timeouts and heartbeat updates.
- Monitor for stale "running" tasks.