-
Notifications
You must be signed in to change notification settings - Fork 203
feat: nonpareil allow for empty FASTQ input files #4731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds many Nonpareil test fixtures, updates test orchestration and expectations, enhances Nonpareil wrappers (input format detection, bz2 handling, empty-input substitution), changes plot wrapper JSON handling, and refreshes pinned environment and environment.yaml pins. Changes
Sequence Diagram(s)sequenceDiagram
participant SM as Snakemake
participant InferW as infer/wrapper.py
participant Tools as pbzip2/gzip/fs
participant Nonpareil as nonpareil
participant PlotW as plot/wrapper.py
SM->>InferW: invoke infer wrapper with input
InferW->>InferW: Path(input) & in_format = get_format(input)
alt input suffix == .bz2
InferW->>Tools: pbzip2 -d (stream) -> write in_uncomp
else input compressed (.gz) or plain
InferW->>Tools: decompress or copy -> in_uncomp
end
InferW->>InferW: probe in_uncomp size / gzip header
alt empty input detected
InferW->>InferW: create minimal dummy FASTA/FASTQ as in_uncomp
end
InferW->>Nonpareil: run nonpareil with constructed options (redundancy, mate_distr, log, ...)
Nonpareil-->>InferW: produce .npa/.npo/.npc outputs
InferW-->>SM: return outputs
SM->>PlotW: invoke plot wrapper for outputs
PlotW->>PlotW: create temp JSON file (NamedTemporaryFile)
PlotW->>PlotW: run plotting command writing to temp JSON
alt json post-processing required
PlotW->>PlotW: load temp JSON, convert "NaN" strings to real NaN (numpy), write final json
end
PlotW-->>SM: final outputs (pdf/tsv/json)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
bio/nonpareil/plot/wrapper.py (1)
26-38: Consider renaming unused loop variable.The JSON post-processing logic correctly addresses the NaN issue. However, the loop variable
keyat line 35 is not used within the loop body.Apply this diff to indicate the variable is intentionally unused:
- for key, val in json_data.items(): + for _key, val in json_data.items(): val["x.adj"] = [np.nan if x == "NaN" else x for x in val["x.adj"]]Or, if the key is truly not needed, consider using
values():- for key, val in json_data.items(): + for val in json_data.values(): val["x.adj"] = [np.nan if x == "NaN" else x for x in val["x.adj"]]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
bio/nonpareil/infer/test/expected/empty.fq.bz2.npo(1 hunks)bio/nonpareil/infer/test/expected/empty.fq.gz.npo(1 hunks)bio/nonpareil/infer/test/expected/empty.fq.npo(1 hunks)bio/nonpareil/infer/wrapper.py(3 hunks)bio/nonpareil/plot/test/Snakefile(1 hunks)bio/nonpareil/plot/test/d.npo(1 hunks)bio/nonpareil/plot/wrapper.py(1 hunks)test_wrappers.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- bio/nonpareil/infer/test/expected/empty.fq.npo
- test_wrappers.py
- bio/nonpareil/infer/test/expected/empty.fq.bz2.npo
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
🧠 Learnings (19)
📓 Common learnings
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-08-14T15:21:37.230Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
Learnt from: dlaehnemann
Repo: snakemake/snakemake-wrappers PR: 3115
File: CHANGELOG.md:5-5
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review release-please commits in the Snakemake wrappers repository as they are auto-formatted.
📚 Learning: 2024-11-26T09:16:24.981Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:1-13
Timestamp: 2024-11-26T09:16:24.981Z
Learning: In test `Snakefile`s (e.g., `test/Snakefile`), it's acceptable to use fixed input and output file names instead of wildcards.
Applied to files:
bio/nonpareil/plot/test/Snakefilebio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-26T15:01:13.202Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:1-23
Timestamp: 2024-11-26T15:01:13.202Z
Learning: The NGS-bits SampleAncestry wrapper in `bio/ngsbits/sampleancestry/` includes a test Snakefile, sample VCF files, and tests available in the `test/` directory.
Applied to files:
bio/nonpareil/plot/test/Snakefile
📚 Learning: 2024-11-26T09:16:39.570Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/test/Snakefile:14-22
Timestamp: 2024-11-26T09:16:39.570Z
Learning: In the `snakemake-wrappers` repository, when writing test `Snakefile`s (e.g., `bio/ngscheckmate/makesnvpattern/test/Snakefile`), hardcoded input/output paths are acceptable because these are examples and the IO can be chosen freely.
Applied to files:
bio/nonpareil/plot/test/Snakefilebio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-21T10:50:09.006Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/call-variants/test/Snakefile:29-47
Timestamp: 2024-11-21T10:50:09.006Z
Learning: In example Snakefiles, using hardcoded sample names is acceptable.
Applied to files:
bio/nonpareil/plot/test/Snakefile
📚 Learning: 2024-11-15T18:31:15.447Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/test/Snakefile:7-10
Timestamp: 2024-11-15T18:31:15.447Z
Learning: In the Snakemake wrappers repository, avoid suggesting refactoring that involves using `tempfile.gettempdir()` or changing output paths to temporary directories.
Applied to files:
bio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-26T14:59:03.678Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3502
File: bio/ngsbits/sampleancestry/wrapper.py:18-23
Timestamp: 2024-11-26T14:59:03.678Z
Learning: In Snakemake wrapper scripts, Snakemake validates input and output paths, so explicit shell quoting is not necessary.
Applied to files:
bio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
📚 Learning: 2024-08-21T08:30:42.757Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/nonpareil/plot/wrapper.py
📚 Learning: 2024-08-21T08:33:50.878Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:33:50.878Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.
Applied to files:
bio/nonpareil/plot/wrapper.py
📚 Learning: 2024-11-15T13:48:33.759Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/preprocess-variants/wrapper.py:0-0
Timestamp: 2024-11-15T13:48:33.759Z
Learning: In Snakemake wrappers, security considerations like input sanitization are unnecessary, as the wrappers are under full control of the user.
Applied to files:
bio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-26T10:49:04.406Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:1-10
Timestamp: 2024-11-26T10:49:04.406Z
Learning: Using generic output filenames like "genome.fasta" is acceptable in the `snakemake-wrappers` project.
Applied to files:
bio/nonpareil/plot/wrapper.pybio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-26T08:31:00.099Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
Applied to files:
bio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-26T09:08:06.041Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3499
File: bio/ngscheckmate/makesnvpattern/wrapper.py:18-24
Timestamp: 2024-11-26T09:08:06.041Z
Learning: In Snakemake wrappers, input file validation is managed by Snakemake, so manual validation in the wrapper code is unnecessary.
Applied to files:
bio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-26T10:49:54.765Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3501
File: meta/bio/varscan2_snpeff/test/Snakefile:58-71
Timestamp: 2024-11-26T10:49:54.765Z
Learning: In test Snakefiles within the snakemake-wrappers repository, it is acceptable to use simplified paths and logging configurations that may differ from real-life pipelines.
Applied to files:
bio/nonpareil/infer/wrapper.py
📚 Learning: 2024-12-06T14:25:43.922Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3498
File: bio/ngscheckmate/ncm/wrapper.py:68-68
Timestamp: 2024-12-06T14:25:43.922Z
Learning: In the `bio/ngscheckmate/ncm/wrapper.py` file for the NGSCheckMate wrapper, do not suggest adding file existence checks for the list file input, as the files are already required by the wrapper.
Applied to files:
bio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-15T18:36:04.660Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/call-variants/wrapper.py:15-23
Timestamp: 2024-11-15T18:36:04.660Z
Learning: In the Snakemake wrappers repository, using `shell=True` and redirecting within shell commands is acceptable.
Applied to files:
bio/nonpareil/infer/wrapper.py
📚 Learning: 2024-11-15T13:44:18.810Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-15T13:44:18.810Z
Learning: In the snakemake-wrappers repository, it is acceptable to use f-strings with `shell=True` in wrapper scripts (e.g., `wrapper.py` files), as these wrappers are under full control of the user. Potential command injection vulnerabilities are not considered an issue in this context.
Applied to files:
bio/nonpareil/infer/wrapper.py
📚 Learning: 2025-06-15T07:43:03.263Z
Learnt from: tedil
Repo: snakemake/snakemake-wrappers PR: 4274
File: bio/mehari/annotate-seqvars/wrapper.py:11-19
Timestamp: 2025-06-15T07:43:03.263Z
Learning: In Snakemake wrappers, a common pattern is to have flag variables that are either empty strings "" or the actual flag strings (e.g., "--keep-intergenic") for direct interpolation in shell commands. This avoids conditionals in the shell expression and keeps the command construction clean.
Applied to files:
bio/nonpareil/infer/wrapper.py
🪛 Ruff (0.14.5)
bio/nonpareil/plot/wrapper.py
10-10: Undefined name snakemake
(F821)
11-11: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
35-35: Loop control variable key not used within loop body
Rename unused key to _key
(B007)
bio/nonpareil/infer/wrapper.py
11-11: Undefined name snakemake
(F821)
12-12: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
16-16: Undefined name snakemake
(F821)
41-41: Undefined name snakemake
(F821)
47-47: Undefined name snakemake
(F821)
67-67: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (10)
bio/nonpareil/plot/wrapper.py (2)
6-6: LGTM!The
tempfileimport is appropriate for the temporary file handling introduced to work around the Nonpareil JSON output issue.
14-24: LGTM!The temporary file approach for JSON output is well-structured, allowing post-processing to fix the NaN issue while writing other formats directly to their final paths.
bio/nonpareil/infer/wrapper.py (5)
6-16: LGTM!Using
get_formatfrom snakemake_wrapper_utils is a robust approach for input format detection, superior to manual extension parsing.
19-36: LGTM!The additional output options follow consistent patterns and properly extend the wrapper's functionality to support more Nonpareil features.
40-47: LGTM!The bz2 decompression logic is clean and properly utilizes parallel decompression with
pbzip2.
81-83: LGTM!Appending a zero-line to the redundancy summary for empty inputs ensures consistent output file presence, preventing downstream pipeline failures.
49-68: The 50-byte threshold for gzip empty-file detection is appropriate.An empty gzipped file is approximately 20 bytes, and larger sizes (21–31 bytes) appear when filename/timestamp/extra fields are stored. The code's 50-byte threshold provides a safe buffer that allows truly empty gzipped files to be properly detected through decompression, while any actual FASTA or FASTQ content would exceed this threshold. The empty input detection and dummy sequence generation logic is sound.
bio/nonpareil/plot/test/d.npo (1)
1-10: LGTM!The test data file follows the correct Nonpareil output format with appropriate metadata headers and zero-valued data matrix for testing empty input scenarios.
bio/nonpareil/plot/test/Snakefile (1)
19-27: LGTM!The test configuration correctly extends the multiple-plot test case to include the new d.npo test data with matching labels and colors.
bio/nonpareil/infer/test/expected/empty.fq.gz.npo (1)
1-10: Expected output is validated by the test framework and confirmed correct.The file
bio/nonpareil/infer/test/expected/empty.fq.gz.npois part of the active test suite (line 411 intest_wrappers.py). The test framework runs the wrapper, generates actual output, and validates byte-for-byte against this expected file usingfilecmp.cmp. All three empty input variants (.fq,.fq.gz,.fq.bz2) consistently produce identical output, confirming correctness. The all-zero redundancy values are appropriate for the single minimal dummy sequence that the wrapper creates to handle empty inputs (GitHub issue #71 workaround).
jsonoutput file (until x.adj is NaN lmrodriguezr/nonpareil#70 is fixed)QC
snakemake-wrappers.While the contributions guidelines are more extensive, please particularly ensure that:
test.pywas updated to call any added or updated example rules in aSnakefileinput:andoutput:file paths in the rules can be chosen arbitrarilyinput:oroutput:)tempfile.gettempdir()points tometa.yamlcontains a link to the documentation of the respective tool or command underurl:Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.