fix(clp-package): Forward signals for cleanup of native containers in sbin scripts (fixes #1701).#1716
fix(clp-package): Forward signals for cleanup of native containers in sbin scripts (fixes #1701).#1716junhaoliao wants to merge 14 commits intoy-scope:mainfrom
sbin scripts (fixes #1701).#1716Conversation
… sbin scripts (fixes y-scope#1701).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdded a signal-forwarding subprocess helper and replaced direct subprocess invocations across the package to run child processes in new process groups; SIGINT/SIGTERM are forwarded to those groups and callers now check/raise on non-zero exit codes where applicable. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Parent as Parent Process
participant Helper as run_subprocess_with_signal_forward
participant Child as Child Process Group
Parent->>Helper: start subprocess in new process group
Note right of Helper: install temporary SIGINT/SIGTERM handlers
User->>Parent: send SIGINT / SIGTERM
Parent->>Helper: handler receives signal
Helper->>Child: forward signal to child process group
Child->>Child: handle signal / terminate
Child-->>Helper: exit with return code
Helper-->>Parent: return CompletedProcess and restore handlers
Parent->>Parent: check return code (raise CalledProcessError if non-zero)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
sbin scripts (fixes #1701).
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
components/clp-package-utils/clp_package_utils/controller.py(4 hunks)components/clp-package-utils/clp_package_utils/general.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/compress.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/search.py(2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-07-28T08:33:57.487Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:133-137
Timestamp: 2025-07-28T08:33:57.487Z
Learning: In CLP integration tests, the `run_and_assert` utility function should only be used for commands that test CLP package functionality (like clp/clp-s compression/decompression operations). For processing or helper commands (like jq, sort, etc.), direct `subprocess.run` calls should be used instead, as they serve different purposes and don't need the same error handling patterns.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/compress.pycomponents/clp-package-utils/clp_package_utils/scripts/archive_manager.pycomponents/clp-package-utils/clp_package_utils/scripts/dataset_manager.pycomponents/clp-package-utils/clp_package_utils/scripts/search.pycomponents/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pycomponents/clp-package-utils/clp_package_utils/scripts/decompress.pycomponents/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/compress.pycomponents/clp-package-utils/clp_package_utils/controller.py
📚 Learning: 2025-07-03T20:10:43.789Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/archive_manager.pycomponents/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
📚 Learning: 2025-08-13T14:48:49.020Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
🧬 Code graph analysis (7)
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
run_subprocess_with_signal_forward(756-794)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
run_subprocess_with_signal_forward(756-794)
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
run_subprocess_with_signal_forward(756-794)
components/clp-package-utils/clp_package_utils/scripts/search.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
run_subprocess_with_signal_forward(756-794)
components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
run_subprocess_with_signal_forward(756-794)
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
run_subprocess_with_signal_forward(756-794)
components/clp-package-utils/clp_package_utils/controller.py (1)
components/clp-package-utils/clp_package_utils/general.py (1)
run_subprocess_with_signal_forward(756-794)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (9)
components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py (1)
20-33: Signal-forwarding wrapper is correctly integrated for dataset manager.The switch to
run_subprocess_with_signal_forward(cmd)while preserving explicit return-code checks and cleanup keeps prior behaviour and adds the desired signal propagation to the containerised dataset manager process.Also applies to: 196-201
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
19-32: Compression command now correctly gains signal forwarding.Using
run_subprocess_with_signal_forward(cmd)and keeping the same return-code handling and cleanup logic maintains the existing user-facing behaviour while ensuring that interrupts terminate the native compression container as well.Also applies to: 268-277
components/clp-package-utils/clp_package_utils/controller.py (2)
50-66: Docker Compose start/stop correctly wrapped with signal forwarding while preserving failure semantics.Replacing direct
subprocess.runwithrun_subprocess_with_signal_forward(..., cwd=self._clp_home)and manually raisingCalledProcessErroron non-zeroreturncodekeeps callers’ error expectations intact and ensures SIGINT/SIGTERM hit thedocker composeprocess group as intended.Also applies to: 895-918, 920-951
992-1021: _chown_recursively correctly adopts the signal-forwarding helper.Using
run_subprocess_with_signal_forward(chown_cmd, stdout=subprocess.DEVNULL)and raisingCalledProcessErroron non-zero exit mirrors prior behaviour while allowing interrupts to terminate long-running chown operations cleanly.components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py (1)
19-33: Signal forwarding correctly added to S3 compression path.Adopting
run_subprocess_with_signal_forward(cmd)with unchanged return-code checks and URL-list cleanup ensures the S3 compression container is terminated on interrupts without altering normal success/failure behaviour.Also applies to: 322-331
components/clp-package-utils/clp_package_utils/scripts/search.py (1)
18-30: Search container execution now benefits from signal forwarding.Running the composed Docker search command through
run_subprocess_with_signal_forward(cmd)with the same return-code handling keeps CLI semantics stable and ensures search containers are torn down when the user interrupts the process.Also applies to: 181-187
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
19-32: Archive manager command correctly routed through signal-forwarding helper.The change to
run_subprocess_with_signal_forward(cmd)retains existing logging and cleanup logic while ensuring that interrupts reach the archive-manager container, avoiding orphaned native processes.Also applies to: 279-285
components/clp-package-utils/clp_package_utils/scripts/decompress.py (2)
19-37: File extraction now uses signal-forwarding subprocess wrapper.Swapping to
run_subprocess_with_signal_forward(cmd)inhandle_extract_file_cmdkeeps the existing error logging and config cleanup while ensuring interrupts terminate the extraction container as well.Also applies to: 162-169
19-37: Stream extraction handler correctly updated to forward signals.The
handle_extract_stream_cmdpath now runs its docker command viarun_subprocess_with_signal_forward(cmd)with unchanged return-code handling, so stream extraction containers are cleaned up on SIGINT/SIGTERM without altering normal behaviour.Also applies to: 271-277
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-package-utils/clp_package_utils/general.py(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
components/clp-package-utils/clp_package_utils/general.py (1)
8-13: New imports and Docker run constants look consistent with added functionality
signalandAnyare appropriate for the new signal‑forwarding helper, and the Docker run timeout/terminate‑grace constants are coherently named and scoped in this module. No issues here from this file’s perspective.Also applies to: 54-55
| def run_subprocess_with_signal_forward(*args: Any, **kwargs: Any) -> subprocess.CompletedProcess: | ||
| """ | ||
| Runs a subprocess in its own process group and forwards `SIGINT` and `SIGTERM` received on the | ||
| calling process to the entire child process group. | ||
|
|
||
| :return: The completed subprocess. | ||
| """ | ||
| kwargs["start_new_session"] = True | ||
| proc = subprocess.Popen(*args, **kwargs) | ||
|
|
||
| # Save original handlers | ||
| original_handlers = { | ||
| signal.SIGINT: signal.signal(signal.SIGINT, signal.SIG_DFL), | ||
| signal.SIGTERM: signal.signal(signal.SIGTERM, signal.SIG_DFL), | ||
| } | ||
|
|
||
| def _signal_handler(signum: int, frame: Any) -> None: | ||
| """Run cleanup, then invoke the original handler.""" | ||
| try: | ||
| # Terminate the entire process group to cascade termination to all children. | ||
| pgid = os.getpgid(proc.pid) | ||
| os.killpg(pgid, signum) | ||
| except ProcessLookupError: | ||
| # Child already terminated. | ||
| pass | ||
|
|
||
| # Restore and invoke original handler | ||
| original = original_handlers[signal.Signals(signum)] | ||
| signal.signal(signum, original) | ||
| if original not in (signal.SIG_DFL, signal.SIG_IGN): | ||
| original(signum, frame) | ||
|
|
||
| signal.signal(signal.SIGINT, _signal_handler) | ||
| signal.signal(signal.SIGTERM, _signal_handler) | ||
|
|
||
| try: | ||
| proc.wait() | ||
| finally: | ||
| # Restore original handlers | ||
| for sig, handler in original_handlers.items(): | ||
| signal.signal(sig, handler) | ||
|
|
||
| return subprocess.CompletedProcess(proc.args, proc.returncode) | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Signal‑forwarding helper is solid but tightly coupled to main thread and global handlers
The overall behaviour of run_subprocess_with_signal_forward looks good: it creates a new process group, forwards SIGINT/SIGTERM to that group, tolerates already‑exited children via ProcessLookupError, and returns a CompletedProcess with proc.args and returncode, which matches subprocess.run semantics for these fields.
There are a couple of important constraints worth making explicit or guarding against:
-
Main‑thread requirement for
signal.signal.
Python only allowssignal.signal(...)from the main thread; calling this helper from a background thread will raiseValueError. Given this lives in a general utilities module, it would be safer either to:- document in the docstring that it must be called from the main thread of a single‑threaded CLI, or
- add a runtime guard (e.g., check
threading.current_thread() is threading.main_thread()and fail fast with a clear error message).
-
Global SIGINT/SIGTERM handler overwrite while the child runs.
Because the helper installs its own handler forSIGINT/SIGTERM, any other code that sets handlers during the subprocess lifetime can interfere (and vice versa). This is probably acceptable for simple top‑level scripts but makes the function hazardous in more complex or multi‑threaded programs. At minimum, the docstring should call this out; if you expect broader reuse, consider refactoring into a context‑manager style API so callers can more explicitly scope and reason about handler changes.
If the intent is “CLI‑only, main‑thread‑only” usage, tightening the documentation (and optionally adding a defensive main‑thread assertion) should be enough.
🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/general.py lines 756-799: The
function sets global SIGINT/SIGTERM handlers which must only be done from the
main thread and will temporarily overwrite global handlers; update the docstring
to state that this helper must be called from the main thread and that it
replaces global SIGINT/SIGTERM handlers for the duration of the call, and add a
runtime guard at the start that checks threading.current_thread() is
threading.main_thread() and raises a clear ValueError if not, so callers
fail-fast; optionally consider exposing a context-manager variant for broader
use, but for now implement the docstring change and the defensive main-thread
assertion.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/clp-package-utils/clp_package_utils/general.py`:
- Around line 803-810: Replace the blocking proc.wait() call inside the
try/finally block in the function with proc.communicate() to avoid deadlocks
when callers pass stdout=PIPE or stderr=PIPE, capture the returned (stdout,
stderr) and then construct and return subprocess.CompletedProcess(proc.args,
proc.returncode, stdout=stdout, stderr=stderr); keep restoring original signal
handlers in the finally block as before and ensure you call proc.communicate()
before entering the finally so handlers are always restored even if communicate
raises.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/clp-package-utils/clp_package_utils/controller.pycomponents/clp-package-utils/clp_package_utils/general.pycomponents/clp-package-utils/clp_package_utils/scripts/compress.pycomponents/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pycomponents/clp-package-utils/clp_package_utils/scripts/search.py
| try: | ||
| proc.wait() | ||
| finally: | ||
| # Restore original handlers | ||
| for sig, handler in original_handlers.items(): | ||
| signal.signal(sig, handler) | ||
|
|
||
| return subprocess.CompletedProcess(proc.args, proc.returncode) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd "general.py" --path "*/clp-package-utils/*" -x cat -n {}Repository: y-scope/clp
Length of output: 270
🏁 Script executed:
fd "general.py" -p "*/clp-package-utils/*" -x cat -n {}Repository: y-scope/clp
Length of output: 163
🏁 Script executed:
find . -name "general.py" -type f | head -20Repository: y-scope/clp
Length of output: 114
🏁 Script executed:
cat -n ./components/clp-package-utils/clp_package_utils/general.py | sed -n '790,820p'Repository: y-scope/clp
Length of output: 1317
🏁 Script executed:
cat -n ./components/clp-package-utils/clp_package_utils/general.py | sed -n '750,815p'Repository: y-scope/clp
Length of output: 2939
🏁 Script executed:
python - <<'PY'
import ast
from pathlib import Path
risky_kwargs = {"stdout", "stderr", "capture_output", "timeout", "input", "check", "text", "encoding", "errors"}
for path in Path(".").rglob("*.py"):
try:
src = path.read_text(encoding="utf-8")
tree = ast.parse(src)
except Exception:
continue
for node in ast.walk(tree):
if isinstance(node, ast.Call):
fn = node.func
name = fn.id if isinstance(fn, ast.Name) else fn.attr if isinstance(fn, ast.Attribute) else None
if name == "run_subprocess_with_signal_forward":
used = {kw.arg for kw in node.keywords if kw.arg is not None}
overlap = sorted(used & risky_kwargs)
if overlap:
print(f"{path}:{node.lineno}: kwargs={overlap}")
PYRepository: y-scope/clp
Length of output: 139
🏁 Script executed:
cat -n ./components/clp-package-utils/clp_package_utils/controller.py | sed -n '1145,1165p'Repository: y-scope/clp
Length of output: 847
🏁 Script executed:
rg "run_subprocess_with_signal_forward" --type py -B 2 -A 2Repository: y-scope/clp
Length of output: 9250
Replace proc.wait() with proc.communicate() to prevent deadlocks and preserve output.
The current implementation at line 804 uses proc.wait(), which can deadlock indefinitely if callers pass stdout=PIPE or stderr=PIPE (or similar output redirection). Additionally, the returned CompletedProcess at line 810 omits stdout and stderr, making the function incompatible with callers expecting subprocess-run style output capture.
Recommended fix
try:
- proc.wait()
+ stdout, stderr = proc.communicate()
finally:
# Restore original handlers
for sig, handler in original_handlers.items():
signal.signal(sig, handler)
- return subprocess.CompletedProcess(proc.args, proc.returncode)
+ return subprocess.CompletedProcess(proc.args, proc.returncode, stdout, stderr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/clp-package-utils/clp_package_utils/general.py` around lines 803 -
810, Replace the blocking proc.wait() call inside the try/finally block in the
function with proc.communicate() to avoid deadlocks when callers pass
stdout=PIPE or stderr=PIPE, capture the returned (stdout, stderr) and then
construct and return subprocess.CompletedProcess(proc.args, proc.returncode,
stdout=stdout, stderr=stderr); keep restoring original signal handlers in the
finally block as before and ensure you call proc.communicate() before entering
the finally so handlers are always restored even if communicate raises.
Description
This PR improves the handling of subprocesses (to launch
sbin"native" scripts) in thesbinscripts by introducing signal forwarding to ensure proper cleanup of native containers. Specifically, it addresses situations where processes started by thesbinscripts may leave orphaned containers if interrupted.Checklist
breaking change.
Validation performed
taskto build the project.cd build/clp-package./sbin/start-clp.shdocker psto find thecompression-schedulerservice container.docker kill <container_id>to kill the scheduler../sbin/compress.sh ~/samples/postgresql.jsonl(or any available file).compress.shexits immediately.docker psshould NOT show anyclp-compression-<uuid>containers running.Summary by CodeRabbit