Skip to content

Conversation

@harumiWeb
Copy link
Owner

@harumiWeb harumiWeb commented Jan 12, 2026

#38 の対策を実施

Summary by CodeRabbit

  • New Features

    • Per-sheet PDF export and full multi-page image rendering (page suffixes for 2+ pages)
    • Subprocess-based PDF→PNG rendering enabled by default, toggleable via environment variable
  • Bug Fixes

    • Improved memory isolation and crash resilience during image export
    • Better resource cleanup and error propagation across render paths
  • Documentation

    • Added release notes and implementation guidance for image output and multi-process rendering
  • Tests

    • New tests covering multi-sheet/multi-page image exports and subprocess rendering paths
  • Chores

    • Patch version bumped to v0.3.6

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

Adds subprocess-capable PDF-to-PNG rendering for Excel sheet images, changes export_sheet_images/export_pdf flows to per-sheet PDF export and subprocess or in-process rasterization, updates docs and release notes, and extends tests for both rendering paths and naming/aggregation behavior.

Changes

Cohort / File(s) Summary
Docs
docs/agents/FEATURE_SPEC.md, docs/agents/TASKS.md, docs/release-notes/v0.3.6.md, mkdocs.yml
Removes merged_cells spec text; adds image-output and multiprocess rendering specs, release notes describing per-sheet multi-page rendering and EXSTRUCT_RENDER_SUBPROCESS toggle; updates mkdocs nav.
Rendering implementation
src/exstruct/render/__init__.py
Implements subprocess-rendering support (env var EXSTRUCT_RENDER_SUBPROCESS), adds helpers _use_render_subprocess, _render_pdf_pages_subprocess, _render_pdf_pages_in_process, _render_pdf_pages_worker, IPC/result plumbing, per-sheet PDF export flow, and updated export_pdf behavior (temporary workbook SaveAs).
Tests — integration/smoke
tests/com/test_render_smoke.py
Adds smoke test verifying multiple print-range sheets produce distinct image outputs and page-suffix handling.
Tests — unit/mocks
tests/render/test_render_init.py
Expands xlwings/pdf test doubles (FakeSheet.api ExportAsFixedFormat, FakeBookApi.SaveAs, FakeApp.display_alerts, PdfDocument length), adds FakeQueue/FakeProcess/FakeContext for subprocess path, tests for subprocess toggle, naming (_pNN suffix), and error propagation.
Packaging
pyproject.toml
Version bump to 0.3.6.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller (process_excel / export_sheet_images)
    participant Excel as Excel COM / xlwings
    participant Main as Main Process (renderer)
    participant Worker as Subprocess Worker
    participant PDF as pypdfium2 / PDF document
    participant Disk as Disk (PNG output)

    Caller->>Excel: Export per-sheet PDF(s)
    Excel-->>Main: PDF path(s)
    Main->>Main: Check EXSTRUCT_RENDER_SUBPROCESS
    alt Subprocess enabled
        Main->>Worker: Send job (PDF path, pages, DPI, outdir)
        Worker->>PDF: Load PDF
        Worker->>PDF: Render pages -> bitmaps
        Worker->>Disk: Write PNG files
        Worker-->>Main: Return PNG paths or error
        Main->>Caller: Aggregate results/errors
    else In-process
        Main->>PDF: Load PDF
        Main->>PDF: Render pages -> bitmaps
        Main->>Disk: Write PNG files
        Main->>Caller: Return PNG paths
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped through sheets and pages wide,
Offloaded work to a friend beside,
One process hums, another paints,
PNG crumbs without memory complaints,
A tiny rabbit cheers: safe renders abide! 🎨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Bug fix/output img' is vague and uses abbreviations that don't clearly convey the main change, making it difficult for teammates to understand the changeset from the title alone. Consider using a more descriptive title like 'Add multiprocess PDF-to-image rendering with subprocess isolation' to clearly communicate the primary change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @tests/com/test_render_smoke.py:
- Around line 39-58: The test test_render_multiple_print_ranges_images asserts
exactly 4 image files but is brittle because multi-page sheets produce extra
files with _p02/_p03 suffixes; update the assertion in
test_render_multiple_print_ranges_images to allow multi-page output (e.g.,
change assert len(images) == 4 to assert len(images) >= 4) or replace the check
by counting unique sheet image prefixes (strip trailing _pNN suffixes from each
filename and assert the number of unique prefixes is 4) so the test passes
regardless of per-sheet page splits.
🧹 Nitpick comments (3)
src/exstruct/render/__init__.py (3)

86-91: Consider clarifying the pdfium validation intent.

In subprocess mode (lines 89-90), _require_pdfium() is called solely to validate the dependency exists before spawning subprocesses, but the result is discarded. While functionally correct, adding a brief comment would clarify this intent.

💡 Suggested clarification
     if not use_subprocess:
         pdfium = cast(Any, _require_pdfium())
     else:
-        _require_pdfium()
+        _require_pdfium()  # Validate dependency before spawning subprocesses

174-198: Consider adding a timeout to prevent indefinite hangs.

The subprocess join (line 189) has no timeout, which means the parent process could block indefinitely if the subprocess hangs or deadlocks (e.g., due to a corrupted PDF or resource exhaustion). This could degrade user experience or cause the application to become unresponsive.

💡 Suggested improvement
     process.start()
-    process.join()
+    process.join(timeout=300)  # 5-minute timeout
+    if process.is_alive():
+        process.terminate()
+        process.join(timeout=5)
+        raise RenderError("PDF rendering subprocess timed out")
     if not queue.empty():
         result = queue.get()

201-229: Code duplication with _render_pdf_pages_in_process.

The rendering logic (scale calculation, page iteration, bitmap rendering, PNG saving with page suffix) is duplicated between this worker and _render_pdf_pages_in_process. While the duplication is partly justified due to different contexts (subprocess vs in-process), consider extracting the core rendering loop into a shared helper to reduce maintenance burden.

Also note: Line 214 creates output_dir defensively, but the parent process already ensures this directory exists at line 85. This redundancy is harmless but could be removed.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bd891a and d6c66ec.

⛔ Files ignored due to path filters (3)
  • tests/assets/multiple_print_ranges_4sheets.xlsx is excluded by !**/*.xlsx
  • tests/assets/sample.xls is excluded by !**/*.xls
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • docs/agents/FEATURE_SPEC.md
  • docs/agents/TASKS.md
  • src/exstruct/render/__init__.py
  • tests/com/test_render_smoke.py
  • tests/render/test_render_init.py
💤 Files with no reviewable changes (1)
  • docs/agents/FEATURE_SPEC.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Add type hints to all function and method arguments and return values following mypy strict mode
Use Any type only at external library boundaries (xlwings, pandas, numpy) and avoid type inference from external library internals
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Follow import ordering: standard library, third-party, exstruct internal modules
Add Google-style docstrings to all functions and classes
Keep cyclomatic complexity below 12 per function and split complex functions into smaller ones
Avoid writing God Functions (large monolithic functions) and God Objects (large classes with many responsibilities)
Avoid unnecessary nesting and deeply nested conditional branches
Ensure code passes mypy strict mode with zero errors
Ensure code passes Ruff checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid circular dependencies between modules
Normalize external library data (xlwings, pandas) to Pydantic models at the boundary layer
Use the ExStruct system as specified in core development; do not write code dependent on external library internal structures

Files:

  • tests/render/test_render_init.py
  • src/exstruct/render/__init__.py
  • tests/com/test_render_smoke.py
🧠 Learnings (1)
📚 Learning: 2026-01-06T13:53:35.863Z
Learnt from: CR
Repo: harumiWeb/exstruct PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-06T13:53:35.863Z
Learning: Reference architectural documentation in docs/architecture/pipeline.md, docs/contributors/architecture.md, docs/agents/CODING_GUIDELINES.md, docs/agents/DATA_MODEL.md, and docs/agents/TASKS.md as needed during development

Applied to files:

  • docs/agents/TASKS.md
🧬 Code graph analysis (2)
src/exstruct/render/__init__.py (2)
src/exstruct/io/__init__.py (1)
  • _sanitize_sheet_filename (96-99)
src/exstruct/errors.py (1)
  • RenderError (28-29)
tests/com/test_render_smoke.py (1)
src/exstruct/__init__.py (1)
  • process_excel (318-398)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (11)
docs/agents/TASKS.md (1)

5-8: LGTM!

The task documentation clearly captures the implementation strategy for subprocess-based rendering, memory isolation, and error handling patterns that align with the code changes in this PR.

src/exstruct/render/__init__.py (5)

4-5: LGTM!

New imports for multiprocessing and os are correctly placed with the standard library imports.


44-49: LGTM!

The addition of display_alerts = False and SaveAs to convert the workbook to xlsx before PDF export correctly addresses the xls file compatibility issue and prevents blocking dialogs during automation.


144-146: LGTM!

The environment variable toggle is implemented correctly with sensible defaults (subprocess enabled by default) and case-insensitive comparison for disabling.


149-171: LGTM!

The in-process rendering function is well-structured with proper type hints, context management for the PDF document, and correct page numbering logic.


140-141: These functions have different implementations and behavior—consolidation would break the render module.

The _sanitize_sheet_filename in src/exstruct/render/__init__.py differs from the io module's version. The render version calls .strip() on the result, ensuring that whitespace-only input returns "sheet" (as verified by the test at line 380 in test_render_init.py). The io version omits .strip(), which means it would return the whitespace string unchanged. Importing from io would break the expected behavior in the render module.

No action required.

tests/render/test_render_init.py (5)

16-126: LGTM!

The fake classes are well-extended to support the new rendering functionality:

  • FakeSheetApi properly simulates per-sheet PDF export
  • FakeBookApi.SaveAs supports the xls-to-xlsx conversion flow
  • FakePdfDocument.__len__ enables multi-page testing scenarios

158-230: LGTM!

The monkeypatch targets are correctly updated to patch xw.App directly (the imported module reference) rather than through the render module's namespace.


254-274: LGTM!

The test comprehensively validates the in-process rendering path including sheet name sanitization, whitespace fallback, and multi-page suffix generation.


335-366: LGTM!

The test effectively validates that the subprocess rendering path is correctly delegated when EXSTRUCT_RENDER_SUBPROCESS=1. The mock approach properly isolates the delegation logic from actual subprocess spawning.


369-374: LGTM!

The test validates the core toggle behavior. Consider adding edge case tests for "false" and "FALSE" values to verify case-insensitivity, though this is optional.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6c66ec7ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/render/test_render_init.py (1)

395-396: Consider removing unused empty() method.

The empty() method is defined but never used in the test suite. Per the commit messages, the implementation moved away from Queue.empty() to queue.get(timeout=...), making this method dead code in the test double.

♻️ Suggested removal
-    def empty(self) -> bool:
-        return self.payload is None
src/exstruct/render/__init__.py (3)

188-189: Add timeout to process.join() to prevent indefinite hangs.

If the worker process deadlocks or hangs (e.g., due to a pypdfium2 bug or large PDF), process.join() will block indefinitely. Consider adding a timeout and terminating the process if it exceeds the limit.

♻️ Suggested fix
     process.start()
-    process.join()
+    process.join(timeout=300)  # 5 minute timeout for large PDFs
+    if process.is_alive():
+        process.terminate()
+        process.join(timeout=5)
+        raise RenderError("PDF rendering subprocess timed out")
     result = _get_subprocess_result(queue)

208-236: Consider extracting shared rendering logic to reduce duplication.

The core rendering logic (lines 220-232) is nearly identical to _render_pdf_pages_in_process (lines 158-170). While extraction is complicated by the subprocess boundary and import semantics, you could extract a helper that accepts the pdfium module as a parameter.

That said, the current duplication is acceptable given the subprocess isolation requirements and the relatively small code surface.


191-195: Type narrowing could improve safety.

The type dict[str, list[str] | str] doesn't distinguish between the "paths" and "error" keys. If paths were accidentally a string, the iteration would yield characters. Consider adding runtime validation or using TypedDict for stricter typing.

♻️ Example with TypedDict
from typing import TypedDict

class WorkerResult(TypedDict, total=False):
    paths: list[str]
    error: str

Or add a runtime check:

     paths = result.get("paths", [])
+    if not isinstance(paths, list):
+        raise RenderError(f"Invalid worker result: paths is not a list")
     return [Path(path) for path in paths]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6c66ec and 014e99c.

📒 Files selected for processing (6)
  • docs/release-notes/v0.3.6.md
  • mkdocs.yml
  • pyproject.toml
  • src/exstruct/render/__init__.py
  • tests/com/test_render_smoke.py
  • tests/render/test_render_init.py
✅ Files skipped from review due to trivial changes (2)
  • docs/release-notes/v0.3.6.md
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/com/test_render_smoke.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Add type hints to all function and method arguments and return values following mypy strict mode
Use Any type only at external library boundaries (xlwings, pandas, numpy) and avoid type inference from external library internals
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Follow import ordering: standard library, third-party, exstruct internal modules
Add Google-style docstrings to all functions and classes
Keep cyclomatic complexity below 12 per function and split complex functions into smaller ones
Avoid writing God Functions (large monolithic functions) and God Objects (large classes with many responsibilities)
Avoid unnecessary nesting and deeply nested conditional branches
Ensure code passes mypy strict mode with zero errors
Ensure code passes Ruff checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid circular dependencies between modules
Normalize external library data (xlwings, pandas) to Pydantic models at the boundary layer
Use the ExStruct system as specified in core development; do not write code dependent on external library internal structures

Files:

  • src/exstruct/render/__init__.py
  • tests/render/test_render_init.py
🧬 Code graph analysis (2)
src/exstruct/render/__init__.py (3)
tests/render/test_render_init.py (10)
  • open (65-70)
  • SaveAs (41-42)
  • ExportAsFixedFormat (29-31)
  • ExportAsFixedFormat (37-39)
  • close (53-55)
  • quit (81-83)
  • render (97-99)
  • to_pil (89-91)
  • save (155-165)
  • put (386-387)
src/exstruct/io/__init__.py (1)
  • _sanitize_sheet_filename (96-99)
src/exstruct/errors.py (1)
  • RenderError (28-29)
tests/render/test_render_init.py (2)
src/exstruct/render/__init__.py (4)
  • _use_render_subprocess (144-146)
  • _render_pdf_pages_subprocess (174-195)
  • _get_subprocess_result (198-205)
  • _render_pdf_pages_worker (208-236)
src/exstruct/errors.py (1)
  • RenderError (28-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: test (windows-latest, 3.12)
🔇 Additional comments (23)
mkdocs.yml (1)

29-31: LGTM!

The new v0.3.6 release notes entry follows the existing naming convention and is correctly placed at the top of the Release Notes list in descending version order.

tests/render/test_render_init.py (16)

1-16: LGTM!

Imports are properly organized: standard library first, then third-party (pytest, xlwings), then internal modules. The use of cast and ModuleType for test type handling is appropriate.


18-31: LGTM!

The FakeSheet and FakeSheetApi stubs correctly simulate the xlwings Sheet interface needed for per-sheet PDF export. The implementation writes valid PDF header bytes which aligns with the test expectations.


41-42: LGTM!

The SaveAs stub correctly simulates the Excel Book.api interface for the temp workbook save in export_pdf.


76-83: LGTM!

The display_alerts attribute correctly simulates the xlwings App interface used by export_pdf.


102-128: LGTM!

The FakePdfDocument stub now correctly simulates multi-page PDFs via __len__, enabling tests to verify the _pNN page suffix naming convention. The conditional page count based on path is a pragmatic approach for test variability.


131-149: LGTM!

ExplodingPdfDocument correctly simulates PDF loading failures for testing error propagation paths.


277-297: LGTM!

Good test coverage for the in-process rendering path with:

  • Name sanitization (Sheet/1Sheet_1, whitespace → sheet)
  • Multi-page PDF naming with _p02 suffix
  • Environment variable toggle for subprocess mode

300-316: LGTM!

The test correctly verifies that RenderError from _require_excel_app propagates without being wrapped again.


338-369: LGTM!

Good integration test verifying that export_sheet_images correctly delegates to the subprocess rendering path when enabled, with proper argument forwarding for each sheet.


399-418: LGTM!

FakeProcess correctly simulates the multiprocessing Process interface for testing subprocess coordination.


421-434: LGTM!

FakeContext correctly simulates the multiprocessing context interface for testing subprocess rendering coordination.


437-461: LGTM!

The test correctly verifies the subprocess success path with proper queue result handling and path conversion.


464-481: LGTM!

The test correctly verifies that worker errors reported via the queue are propagated as RenderError.


484-490: LGTM!

The test correctly verifies timeout handling in _get_subprocess_result.


493-517: LGTM with a note on test isolation.

The test correctly verifies the worker function's output. The sys.modules manipulation for pypdfium2 is cleaned up in the finally block, which is good. If tests run in parallel, consider using monkeypatch.syspath_prepend or a context manager approach for stronger isolation.


520-537: LGTM!

The test correctly verifies that worker exceptions are captured and reported via the queue payload.

src/exstruct/render/__init__.py (6)

1-16: LGTM!

Imports are properly organized and the addition of multiprocessing and os modules is appropriate for the new subprocess rendering functionality.


44-48: LGTM!

Setting display_alerts = False and saving to a temp file before export is defensive programming that prevents Excel dialogs from blocking automation and isolates the original workbook from modifications.


79-137: LGTM!

The refactored export_sheet_images correctly:

  • Conditionally loads pypdfium2 only when needed for in-process rendering
  • Iterates sheets to export per-sheet PDFs before rasterization
  • Delegates to subprocess or in-process rendering based on environment toggle
  • Maintains proper resource cleanup with try/finally

144-146: LGTM!

The function correctly defaults to subprocess rendering (safer for memory isolation) and only disables it when explicitly set to "0" or "false".


149-171: LGTM!

The in-process rendering logic is correct with proper DPI-to-scale conversion and multi-page naming. However, see the comment on _render_pdf_pages_worker regarding code duplication.


198-205: LGTM!

The 5-second timeout is reasonable for queue result retrieval after the process has joined. The broad exception handling is appropriate here since all failures should be reported uniformly.

@harumiWeb harumiWeb merged commit 28e74c1 into main Jan 12, 2026
9 checks passed
@harumiWeb harumiWeb deleted the bug-fix/output_img branch January 12, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants