feat(integration-tests): Test package compression.#1801
feat(integration-tests): Test package compression.#1801quinntaylormitchell wants to merge 20 commits intoy-scope:mainfrom
Conversation
WalkthroughAdds integration test tooling and datasets for package compression workflows: pytest environment var, new utilities to run/verify compression and manage paths, tests for JSON and text multifile packages, and sample log datasets for both formats. Changes
Sequence Diagram(s)sequenceDiagram
actor Test as Compression Test
participant Exec as Compression Script
participant FS as File System
participant Decomp as Decompression Script
Test->>FS: Clear archives
Test->>Exec: Run compression job (config, dataset path, flags)
Exec->>FS: Read dataset files
Exec->>FS: Write compressed archive
Test->>FS: Locate compressed archive
Test->>Decomp: Run decompression (temp config)
Decomp->>FS: Extract archive -> decompressed dir
Test->>FS: Compare decompressed dir vs original dataset
Test->>FS: Assert equality and cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.14.14)integration-tests/tests/fixtures/path_configs.py�[1;31mruff failed�[0m integration-tests/tests/package_tests/clp_json/test_clp_json.py�[1;31mruff failed�[0m integration-tests/tests/package_tests/clp_text/test_clp_text.py�[1;31mruff failed�[0m
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (15)
integration-tests/tests/conftest.py(1 hunks)integration-tests/tests/data/__init__.py(1 hunks)integration-tests/tests/data/datasets/json-multifile/README.md(1 hunks)integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-08.jsonl(1 hunks)integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-09.jsonl(1 hunks)integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-11.jsonl(1 hunks)integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-19.jsonl(1 hunks)integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-21.jsonl(1 hunks)integration-tests/tests/data/jobs/compression/__init__.py(1 hunks)integration-tests/tests/data/jobs/compression/compression_jobs.py(1 hunks)integration-tests/tests/fixtures/package_config.py(2 hunks)integration-tests/tests/fixtures/package_instance.py(2 hunks)integration-tests/tests/test_package_start.py(3 hunks)integration-tests/tests/utils/clp_job_utils.py(1 hunks)integration-tests/tests/utils/config.py(3 hunks)
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:133-137
Timestamp: 2025-07-28T08:33:57.487Z
Learning: In CLP integration tests, the `run_and_assert` utility function should only be used for commands that test CLP package functionality (like clp/clp-s compression/decompression operations). For processing or helper commands (like jq, sort, etc.), direct `subprocess.run` calls should be used instead, as they serve different purposes and don't need the same error handling patterns.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/utils/config.py:121-123
Timestamp: 2025-08-20T22:07:04.953Z
Learning: For the CLP integration tests codebase, do not suggest deriving tarball filenames from URLs instead of hard-coding ".tar.gz" extension. The user has explicitly rejected this suggestion.
📚 Learning: 2025-08-20T08:36:38.391Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/__init__.py:1-1
Timestamp: 2025-08-20T08:36:38.391Z
Learning: In the CLP project, __init__.py files should include short module-level docstrings describing the package purpose, as required by their ruff linting configuration, rather than being left empty.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.pyintegration-tests/tests/data/__init__.py
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.pyintegration-tests/tests/utils/clp_job_utils.pyintegration-tests/tests/test_package_start.pyintegration-tests/tests/data/jobs/compression/compression_jobs.py
📚 Learning: 2025-09-15T22:20:40.750Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.py
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.py
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.pyintegration-tests/tests/utils/clp_job_utils.py
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.pyintegration-tests/tests/test_package_start.py
📚 Learning: 2025-07-28T08:33:57.487Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:133-137
Timestamp: 2025-07-28T08:33:57.487Z
Learning: In CLP integration tests, the `run_and_assert` utility function should only be used for commands that test CLP package functionality (like clp/clp-s compression/decompression operations). For processing or helper commands (like jq, sort, etc.), direct `subprocess.run` calls should be used instead, as they serve different purposes and don't need the same error handling patterns.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.pyintegration-tests/tests/test_package_start.pyintegration-tests/tests/data/__init__.py
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/test_package_start.pyintegration-tests/tests/fixtures/package_instance.pyintegration-tests/tests/conftest.pyintegration-tests/tests/fixtures/package_config.pyintegration-tests/tests/data/__init__.py
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
integration-tests/tests/test_package_start.pyintegration-tests/tests/data/__init__.py
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Applied to files:
integration-tests/tests/data/__init__.py
🧬 Code graph analysis (4)
integration-tests/tests/utils/clp_job_utils.py (1)
integration-tests/tests/utils/config.py (2)
PackageCompressionJob(125-144)PackageJobList(148-151)
integration-tests/tests/test_package_start.py (1)
integration-tests/tests/fixtures/package_instance.py (1)
fixt_package_instance(21-52)
integration-tests/tests/fixtures/package_config.py (4)
integration-tests/tests/utils/clp_job_utils.py (1)
build_package_job_list(32-48)integration-tests/tests/utils/clp_mode_utils.py (2)
get_clp_config_from_mode(87-98)get_required_component_list(101-116)integration-tests/tests/utils/config.py (2)
PackageConfig(155-194)PackagePathConfig(68-121)integration-tests/tests/utils/port_utils.py (1)
assign_ports_from_base(39-61)
integration-tests/tests/data/jobs/compression/compression_jobs.py (1)
integration-tests/tests/utils/config.py (1)
PackageCompressionJob(125-144)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: check-generated
🔇 Additional comments (15)
integration-tests/tests/data/jobs/compression/__init__.py (1)
1-1: LGTM!The module docstring is clear and follows the project's ruff linting requirements for init.py files.
integration-tests/tests/data/__init__.py (1)
1-1: LGTM!The module docstring appropriately describes the package purpose and complies with the project's linting requirements.
integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-21.jsonl (1)
1-8: LGTM!The test dataset is well-structured with consistent JSON Lines formatting, sequential timestamps, and appropriate mission event data for integration testing.
integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-19.jsonl (1)
1-8: LGTM!The test dataset contains well-formed JSON Lines with sequential timestamps and appropriate re-entry sequence events for integration testing.
integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-08.jsonl (1)
1-8: LGTM!The test dataset is properly formatted with sequential timestamps and realistic launch sequence events suitable for integration testing.
integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-09.jsonl (1)
1-8: LGTM!The test dataset contains properly formatted JSON Lines with sequential timestamps and appropriate ISS docking sequence events for integration testing.
integration-tests/tests/utils/config.py (3)
8-8: LGTM!The
Anyimport is appropriately used for the flexible typing of flag values inPackageCompressionJob.
147-152: LGTM!The
PackageJobListdataclass is well-defined with appropriate typing for holding a list of compression jobs.
173-174: LGTM!The addition of the optional
package_job_listfield toPackageConfigappropriately extends the configuration to support test job management.integration-tests/tests/data/datasets/json-multifile/README.md (1)
1-16: LGTM!The dataset metadata accurately reflects the five JSONL files, with correct counts, type, and timestamp ranges verified against the actual dataset files.
integration-tests/tests/data/datasets/json-multifile/logs/sts-135-2011-07-11.jsonl (1)
1-8: LGTM! Test data is well-structured.The JSON Lines test data is well-formed with consistent timestamps, sequential line indices, and appropriate field values for integration testing.
integration-tests/tests/fixtures/package_config.py (1)
48-60: LGTM! Job list building logic is correct.The fixture correctly builds the package job list based on the CLI options, passing the appropriate filter and conditionally skipping job creation when
--no-jobsis specified.integration-tests/tests/fixtures/package_instance.py (1)
32-38: LGTM! Skip logic correctly handles job filtering.The conditional skip logic appropriately handles the case where a mode has no jobs matching the filter, while still allowing package startup/shutdown testing when
--no-jobsis explicitly specified.integration-tests/tests/utils/clp_job_utils.py (1)
32-48: LGTM! Job list building logic is clear and correct.The function appropriately filters compression jobs and returns
Nonewhen no jobs match, which integrates well with the fixture logic inpackage_instance.py.integration-tests/tests/conftest.py (1)
31-36: LGTM! The --no-jobs option is well-defined.The boolean flag and help text clearly communicate the option's purpose for testing package startup and shutdown without running jobs.
integration-tests/tests/data/jobs/compression/compression_jobs.py
Outdated
Show resolved
Hide resolved
Bill-hbrhbr
left a comment
There was a problem hiding this comment.
These are some ideas we’ve touched on before but never really circled back to or explored in depth. I think I have a good understanding of your overall design, and it makes sense to me. That said, I wonder if we could lean toward a more modular approach rather than bundling all package related tests together.
In particular, I’m a bit hesitant about adding more pytest parser options. They work, but I worry they might make the testing setup harder to reason about over time. I’d be interested in discussing whether we can achieve the same goals using pytest’s existing mechanisms while keeping things simpler and more modular.
integration-tests/tests/package_tests/clp_json/data/json-multifile/README.md
Show resolved
Hide resolved
integration-tests/tests/data/jobs/compression/compression_jobs.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
integration-tests/tests/fixtures/package_test_config.py (1)
37-41: Definepackage_job_listbefore use.
package_job_listis referenced but never set in this fixture, which will raise aNameErrorwhen the fixture runs. Consider deriving it from the new CLI options andbuild_package_job_list.🛠️ Suggested fix
-from tests.utils.config import PackageModeConfig, PackagePathConfig, PackageTestConfig +from tests.utils.config import PackageModeConfig, PackagePathConfig, PackageTestConfig +from tests.utils.clp_job_utils import build_package_job_list @@ # Construct PackageTestConfig. + job_name_contains = request.config.getoption("--job-name-contains") + no_jobs = request.config.getoption("--no-jobs") + package_job_list = None if no_jobs else build_package_job_list( + mode_config.mode_name, job_name_contains + ) package_test_config = PackageTestConfig( path_config=fixt_package_path_config, mode_config=mode_config, base_port=base_port, package_job_list=package_job_list, )
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
integration-tests/tests/utils/config.py (1)
101-117: Createpackage_decompression_dirbefore use.
It’s initialised but never created, so decompression can fail if the script expects the directory to exist.🛠️ Proposed fix
# Create directories if they do not already exist. self.temp_config_dir.mkdir(parents=True, exist_ok=True) + self.package_decompression_dir.mkdir(parents=True, exist_ok=True) self.clp_log_dir.mkdir(parents=True, exist_ok=True)integration-tests/tests/package_tests/clp_json/test_clp_json.py (1)
59-117: Add cleanup to avoid test cross-contamination.
The TODOs indicate cleanup is pending; leavingvar/data,var/log, andvar/tmpcan leak state across runs. Consider atry/finallycleanup in the test or a fixture finalizer.🧹 Suggested cleanup pattern
-from tests.utils.utils import resolve_path_env_var +from tests.utils.utils import resolve_path_env_var, unlink ... package_test_config = fixt_package_instance.package_test_config - run_package_compression_script(compression_job, package_test_config) - - # Check the correctness of compression. - verify_compression(compression_job, package_test_config) + try: + run_package_compression_script(compression_job, package_test_config) + # Check the correctness of compression. + verify_compression(compression_job, package_test_config) + finally: + path_config = package_test_config.path_config + unlink(path_config.clp_package_dir / "var" / "data") + unlink(path_config.clp_package_dir / "var" / "log") + unlink(path_config.clp_package_dir / "var" / "tmp")If you want, I can draft a reusable fixture/helper for cleanup.
🤖 Fix all issues with AI agents
In `@integration-tests/tests/utils/asserting_utils.py`:
- Around line 109-151: In verify_compression, the output_path is built as
decompression_dir / path_to_dataset which collapses to path_to_dataset when
path_to_dataset is absolute; change the construction so output_path points to
the extracted dataset directory under decompression_dir (e.g., use the dataset
basename via path_to_dataset.name or Path(path_to_dataset).name) before calling
is_dir_tree_content_equal; update references to decompression_dir,
path_to_dataset and output_path accordingly so the comparison checks the
decompressed copy, not the original.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In
`@integration-tests/tests/package_tests/clp_text/sample_datasets/text-multifile/logs/apollo-17_day08.txt`:
- Around line 1-15: Update the two suggested style refinements in the log
entries: change the phrase in the e006 entry ("flags and commemorative plaques
documented in high resolution imagery") to use the compound adjective
"high-resolution imagery", and change the phrase in the e013 entry ("crew rests
briefly prior to ascent timeline") to the more concise "crew rests briefly
before ascent timeline". Locate and edit the exact lines containing "e006" and
"e013" to apply these textual replacements.
In
`@integration-tests/tests/package_tests/clp_text/sample_datasets/text-multifile/README.md`:
- Line 4: The line containing "Unstructured text" has a trailing space causing
markdownlint MD009; edit the README.md and remove the trailing whitespace at the
end of that line (the line that exactly reads "Unstructured text") so it ends
with no extra space.
In `@integration-tests/tests/package_tests/clp_text/test_clp_text.py`:
- Around line 59-63: Update the docstring in test_clp_text.py to reference the
correct package and dataset: replace the incorrect "clp-json" with "clp-text"
and "json-multifile" with "text-multifile" so the docstring accurately describes
that the clp-text package compresses the text-multifile dataset; locate the
docstring at the top of the test function in this file and make the two string
edits.
- Around line 64-91: Call clear_package_archives() on the test package path
config before running compression to ensure no pre-existing archives affect
results; retrieve the config from fixt_package_instance.package_test_config (or
assign package_path_config =
fixt_package_instance.package_test_config.path_config) and invoke
package_path_config.clear_package_archives() prior to creating/running
PackageCompressionJob and run_package_compression_script so the test starts with
a clean archive state.
In `@integration-tests/tests/utils/config.py`:
- Around line 140-143: The clear_package_archives method should guard against a
missing archives directory before calling clean_directory: check the computed
archives_dir (from self.clp_package_dir / "var" / "data" / "archives") exists
and is a directory (e.g., archives_dir.exists() or archives_dir.is_dir()) and
only call clean_directory(archives_dir) when it does; if it doesn't exist,
simply return/do nothing to avoid raising FileNotFoundError from
clean_directory.
In `@integration-tests/tests/utils/utils.py`:
- Around line 13-23: The clean_directory function should handle a non-existent
directory gracefully to avoid FileNotFoundError in callers like
clear_package_archives: update clean_directory to check directory.exists() (or
directory.is_dir()) and simply return when the path is missing (or else
explicitly document that it raises FileNotFoundError in the docstring if you
want failure semantics); keep the rest of the logic
(iterdir/unlink/shutil.rmtree) unchanged and reference the function name
clean_directory and the caller clear_package_archives when making the change so
tests that expect absent archive dirs no longer fail unexpectedly.
...n-tests/tests/package_tests/clp_text/sample_datasets/text-multifile/logs/apollo-17_day08.txt
Outdated
Show resolved
Hide resolved
integration-tests/tests/package_tests/clp_text/sample_datasets/text-multifile/README.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@integration-tests/tests/package_tests/clp_text/sample_datasets/text-multifile/logs/apollo-17_day01.txt`:
- Around line 1-10: The timestamp milliseconds in the log lines (e.g., the
sequence starting with "1972-12-07T05:33:00.000 apollo-17 d01 e000 ..." showing
.000, .999, .998, ...) form an artificial decreasing pattern; decide whether
this was intentional and if not, normalize the milliseconds across the entries
(for example set all timestamps to .000 or to realistic randomized ms) in the
apollo-17_day01 log block so the data generation artifact is removed; update the
log lines that include the pattern (.000/.999/.998/... shown in the sample
entries) to use the chosen consistent or randomized millisecond format.
integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day01.txt
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@integration-tests/tests/utils/asserting_utils.py`:
- Around line 143-146: The second assignment to decompression_dir is
redundant—remove the repeated line "decompression_dir =
package_test_config.path_config.package_decompression_dir" so the function
reuses the earlier decompression_dir defined at line 129; then keep the
subsequent computations that use decompression_dir (the output_path =
decompression_dir / path_to_dataset.relative_to(path_to_dataset.anchor) line)
intact, ensuring compression_job.path_to_dataset and path_to_dataset variables
remain used as before.
- Around line 122-124: The branch handling mode in asserting_utils.py currently
does a no-op "assert True" for ("clp-json", "clp-presto"), so replace that
placeholder with an explicit test skip or warning: import pytest (or use the
existing test logger) and call pytest.skip("Waiting for PR 1299") inside the
mode in ("clp-json", "clp-presto") branch (or log a clear warning and return
from verify_compression) so the behavior is explicit; update the branch
referencing the mode variable and the placeholder assertion to perform the
skip/log instead of assert True.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@integration-tests/tests/utils/asserting_utils.py`:
- Around line 143-151: The test currently calls pytest.fail before calling
unlink(decompression_dir), so decompression_dir is only cleaned on success; wrap
the verification and cleanup in a try/finally so unlink(decompression_dir)
always runs: compute path_to_dataset and output_path, run the
is_dir_tree_content_equal check inside the try block and call pytest.fail on
mismatch, and perform unlink(decompression_dir) in the finally block to
guarantee cleanup; reference the existing names compression_job,
path_to_dataset, output_path, is_dir_tree_content_equal, pytest.fail, and unlink
to locate where to implement the try/finally.
| resolve_path_env_var("INTEGRATION_TESTS_DIR") | ||
| / "tests" |
There was a problem hiding this comment.
Let's make this path a part of IntegrationTestPathConfig. We can name it test_scripts_dir or test_source_dir. In hindsight, should've named the other attribute test_build_dir. We can fix it in a later PR.
There was a problem hiding this comment.
Considering the fact that the package test code doesn't use IntegrationTestPathConfig anywhere at the moment, it seems a bit weird to introduce that class throughout the package test code just to store one path (cause then we'd have to have IntegrationTestPathConfig as a parameter in every package test function). My next thought is "What if we stored it in PackagePathConfig instead", but that doesn't really seem right either.
I think ultimately what we need is an ironed-out (standardized) way to store paths that are relevant to all integration tests (maybe that's what should get stored in pytest.ini?), and then a different way to store paths for integration tests of different types (maybe PackagePathConfig for package, BinaryPathConfig for binaries, something like that).
Let's talk about this offline; it doesn't seem like it's a blocker for this PR in particular though.
There was a problem hiding this comment.
My approach to path definitions in the integration tests project is as follows:
-
pytest.inidefines environment variable to path mappings with sensible defaults, while still allowing users to override locations as needed. For example,CLP_CORE_BINS_DIRtypically points toclp/build/core, but users may choose a different build location such asclp/components/core/build. New additions topytest.inishould not contain any paths that can be fully derived from existing ones (i.e. it's not possible for them to have custom locations). -
All paths defined in
pytest.iniare resolved centrally inpath_configs.pyand exposed through session-scoped fixtures. -
Derived paths, such as the path to the
clp-sbinary or to the package compression scriptcompress.py, are provided as@propertyattributes on these session-scoped fixture classes.
integration-tests/tests/package_tests/clp_json/test_clp_json.py
Outdated
Show resolved
Hide resolved
| run_and_assert(stop_cmd, timeout=DEFAULT_CMD_TIMEOUT_SECONDS) | ||
|
|
||
|
|
||
| def run_package_compression_script( |
There was a problem hiding this comment.
This function doesn't feel like it's unique to running the compression job script. This function can be generalized to running any binary/script that takes option flags first and then positional arguments in the end.
Of course there are also cases where these two scenarios are interleaved. E.g.
clp-s x --ordered archive decomp
where the binary is first followed by pos arg, then by an option, and eventually by pos args again. Anyways let's not account for this situation now.
What I suggest is moving the core of this function into a new helper called build_command, and call run_and_assert externally. That means you should also generalize PackageCompressionJob to CommandArgumentBuilder or something that fits, and compression_job.path_to_dataset actually becomes the one and only positional argument for CommandArgumentBuilder.
CommandArgumentBuilder shouldn’t be a dataclass since it will hold construction logic, and it can have an API called build() (instead of the build_command() as I suggested before).
There was a problem hiding this comment.
In concept I agree with you. I don't see that there is any real reason to generalize it within the context of this PR though; how about I leave it as-is for now, and then when I put up the search PRs, I can generalize it to build_command then?
| pytest.fail("Mode mismatch: running configuration does not match intended configuration.") | ||
|
|
||
|
|
||
| def verify_compression( |
There was a problem hiding this comment.
| def verify_compression( | |
| def verify_package_compression( |
Also this seems more like part of the testing code instead of code that can be reused. I don't know how I feel about putting this in asserting_utils instead of directly in test_clp_text and test_clp_json
There was a problem hiding this comment.
agree with name change.
I do see the value in having a version of this function that is specific to each mode to avoid the multiplexing that's going on here, but that means that each future mode module (e.g., test_clp_presto, test_clp_json_s3, etc.) will need one too. I think that keeping it here is ultimately the better call, especially as the number of modes will grow faster than the number of ways we verify compression (e.g., the future test_clp_presto and test_clp_json_s3 will both use the same verification workflow as clp-json).
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@integration-tests/tests/package_tests/clp_json/test_clp_json.py`:
- Around line 111-117: This test file
(integration-tests/tests/package_tests/clp_json/test_clp_json.py) currently
contains only TODOs and a bare "assert True" which yields false CI positives;
update the placeholder test to be explicitly skipped or xfailed until
implemented by either adding pytest import and decorating the test function with
`@pytest.mark.skip`(reason="TODO: implement compression/search assertions") or
`@pytest.mark.xfail`(strict=False, reason="TODO: implement"), or call
pytest.skip("TODO: implement") at the top of the test body; ensure you reference
the existing test function name in your change so the test runner marks it
skipped/xfail instead of passing.
In
`@integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day10.txt`:
- Line 4: Replace the unhyphenated phrase "ground based" with the compound
adjective "ground-based" in the log line containing "navigation sightings
compare well with ground based tracking solutions" so the text reads "navigation
sightings compare well with ground-based tracking solutions"; update the entry
in apollo-17_day10.txt accordingly.
In `@integration-tests/tests/utils/asserting_utils.py`:
- Around line 110-156: The verify_package_compression function currently checks
mode for "clp-json" and "clp-text" but lacks an else branch, so unsupported or
misconfigured modes silently pass; add an explicit failure when
package_test_config.mode_config.mode_name is not recognized (i.e., add an else
after the existing if/elif) and call pytest.fail with a clear message that
includes the unexpected mode and relevant identifiers (e.g., package_test_config
or compression_job.path_to_dataset) to fail-fast on mode-related
misconfigurations.
In `@integration-tests/tests/utils/utils.py`:
- Around line 117-118: The early-return check uses path_to_remove.exists() which
skips broken symlinks; change the guard to allow symlinks (including broken
ones) to be removed by checking either os.path.lexists(path_to_remove) or by
replacing the condition with "if not (path_to_remove.exists() or
path_to_remove.is_symlink()): return" so the code will not return when
path_to_remove is a broken symlink; update the logic around path_to_remove in
utils.py accordingly.
integration-tests/tests/package_tests/clp_text/data/text-multifile/logs/apollo-17_day10.txt
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@integration-tests/tests/utils/asserting_utils.py`:
- Line 147: Add the missing timeout to the decompression invocation: change the
run_and_assert(decompress_cmd) call to pass timeout=DEFAULT_CMD_TIMEOUT_SECONDS;
if DEFAULT_CMD_TIMEOUT_SECONDS is not in scope, import it from where other tests
pull it (or define it locally using the same value) so run_and_assert and
decompress_cmd use the consistent timeout to prevent indefinite hangs.
In `@integration-tests/tests/utils/config.py`:
- Around line 161-172: Change PackageCompressionJob so options and
positional_args default to empty lists using
dataclasses.field(default_factory=list) instead of typing them as list[str] |
None; update any consuming code such as run_package_compression_script to treat
these attributes as lists (remove None checks/conditional extends) and rely on
them being empty by default, and adjust any callers that currently pass None to
omit those parameters.
| ] | ||
|
|
||
| # Run decompression command and assert that it succeeds. | ||
| run_and_assert(decompress_cmd) |
There was a problem hiding this comment.
Missing timeout on run_and_assert for the decompression command.
All other run_and_assert calls in package_utils.py pass timeout=DEFAULT_CMD_TIMEOUT_SECONDS. Without a timeout here, a stalling decompression process will hang the test indefinitely.
🛡️ Proposed fix
You'll need to import the constant or define a local one:
+from tests.utils.package_utils import DEFAULT_CMD_TIMEOUT_SECONDSThen:
# Run decompression command and assert that it succeeds.
- run_and_assert(decompress_cmd)
+ run_and_assert(decompress_cmd, timeout=DEFAULT_CMD_TIMEOUT_SECONDS)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run_and_assert(decompress_cmd) | |
| from tests.utils.package_utils import DEFAULT_CMD_TIMEOUT_SECONDS | |
| # ... (other imports and code) | |
| run_and_assert(decompress_cmd, timeout=DEFAULT_CMD_TIMEOUT_SECONDS) |
🤖 Prompt for AI Agents
In `@integration-tests/tests/utils/asserting_utils.py` at line 147, Add the
missing timeout to the decompression invocation: change the
run_and_assert(decompress_cmd) call to pass timeout=DEFAULT_CMD_TIMEOUT_SECONDS;
if DEFAULT_CMD_TIMEOUT_SECONDS is not in scope, import it from where other tests
pull it (or define it locally using the same value) so run_and_assert and
decompress_cmd use the consistent timeout to prevent indefinite hangs.
| @dataclass(frozen=True) | ||
| class PackageCompressionJob: | ||
| """A compression job for a package test.""" | ||
|
|
||
| #: The absolute path to the dataset (either a file or directory). | ||
| path_to_original_dataset: Path | ||
|
|
||
| #: Options to specify in the compression command. | ||
| options: list[str] | None | ||
|
|
||
| #: Positional arguments to specify in the compression command (do not put paths to compress) | ||
| positional_args: list[str] | None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider using empty lists as defaults instead of None.
Using None for options and positional_args forces every caller to pass None explicitly (since frozen=True requires all fields), and the consuming code in run_package_compression_script must null-check before extending. Using field(default_factory=list) would simplify both construction and consumption.
♻️ Proposed refactor
In config.py:
+from dataclasses import dataclass, field, InitVar
+
`@dataclass`(frozen=True)
class PackageCompressionJob:
"""A compression job for a package test."""
#: The absolute path to the dataset (either a file or directory).
path_to_original_dataset: Path
#: Options to specify in the compression command.
- options: list[str] | None
+ options: list[str] = field(default_factory=list)
#: Positional arguments to specify in the compression command (do not put paths to compress)
- positional_args: list[str] | None
+ positional_args: list[str] = field(default_factory=list)In package_utils.py:
- if compression_job.options is not None:
- compress_cmd.extend(compression_job.options)
-
- if compression_job.positional_args is not None:
- compress_cmd.extend(compression_job.positional_args)
+ compress_cmd.extend(compression_job.options)
+ compress_cmd.extend(compression_job.positional_args)Callers would then omit options and positional_args when not needed:
PackageCompressionJob(path_to_original_dataset=some_path)🤖 Prompt for AI Agents
In `@integration-tests/tests/utils/config.py` around lines 161 - 172, Change
PackageCompressionJob so options and positional_args default to empty lists
using dataclasses.field(default_factory=list) instead of typing them as
list[str] | None; update any consuming code such as
run_package_compression_script to treat these attributes as lists (remove None
checks/conditional extends) and rely on them being empty by default, and adjust
any callers that currently pass None to omit those parameters.
Description
This PR implements integration tests for package compression. The following new features are added:
test_clp_json_compression_json_multifileandtest_clp_text_compression_text_multifile.run_package_compression_scriptandverify_compression.PackageCompressionJob.json-multifileandtext-multifile.Future development of compression package testing can be accomplished by adding new tests to the
test_clp_*file using the same format as thetest_clp_*_compression_*tests.NOTE:
verify_compressiondoes not check for the correctness of compression when testingclp-json; to accomplish this check, we need #1299 merged.Checklist
breaking change.
Validation performed
Ran the following command from
clp/integration-tests:uv run pytest -m 'package'All
clp-jsonandclp-texttests passed.Summary by CodeRabbit
Tests
Documentation
Chores