diff --git a/README.ja.md b/README.ja.md index 74958de..6ed7e01 100644 --- a/README.ja.md +++ b/README.ja.md @@ -1,15 +1,15 @@ # ExStruct — Excel 構造化抽出エンジン -[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) +[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [![codecov](https://codecov.io/gh/harumiWeb/exstruct/graph/badge.svg?token=2XI1O8TTA9)](https://codecov.io/gh/harumiWeb/exstruct) ![ExStruct Image](/docs/assets/icon.webp) -ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。 +ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・SmartArt・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。 ## 主な特徴 -- **Excel → 構造化 JSON**: セル、図形、チャート、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。 -- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。 +- **Excel → 構造化 JSON**: セル、図形、チャート、SmartArt、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。 +- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート、SmartArt)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。 - **フォーマット**: JSON(デフォルトはコンパクト、`--pretty` で整形)、YAML、TOON(任意依存)。 - **テーブル検出のチューニング**: API でヒューリスティックを動的に変更可能。 - **ハイパーリンク抽出**: `verbose` モード(または `include_cell_links=True` 指定)でセルのリンクを `links` に出力。 @@ -396,6 +396,11 @@ ExStruct の内部実装を拡張する場合は、 → [docs/contributors/architecture.md](docs/contributors/architecture.md) +## カバレッジに関する注意 + +セル構造推論ロジック(cells.py)は、ヒューリスティックルールと +Excel 固有の動作に依存しています。網羅的なテストは現実世界の信頼性を反映できないため、完全なカバレッジは意図的に追求されていません。 + ## License BSD-3-Clause. See `LICENSE` for details. diff --git a/README.md b/README.md index 1284127..11a456b 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,17 @@ # ExStruct — Excel Structured Extraction Engine -[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) +[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [![codecov](https://codecov.io/gh/harumiWeb/exstruct/graph/badge.svg?token=2XI1O8TTA9)](https://codecov.io/gh/harumiWeb/exstruct) ![ExStruct Image](/docs/assets/icon.webp) -ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines. +ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, smartart, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines. [日本版 README](README.ja.md) ## Features -- **Excel → Structured JSON**: cells, shapes, charts, table candidates, print areas/views, and auto page-break areas per sheet. -- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled. +- **Excel → Structured JSON**: cells, shapes, charts, smartart, table candidates, print areas/views, and auto page-break areas per sheet. +- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, smartart, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled. - **Auto page-break export (COM only)**: capture Excel-computed auto page breaks and write per-area JSON/YAML/TOON when requested (CLI option appears only when COM is available). - **Formats**: JSON (compact by default, `--pretty` available), YAML, TOON (optional dependencies). - **Table detection tuning**: adjust heuristics at runtime via API. @@ -395,6 +395,12 @@ please read the contributor architecture guide. → [docs/contributors/architecture.md](docs/contributors/architecture.md) +## Note on coverage + +The cell-structure inference logic (cells.py) relies on heuristic rules +and Excel-specific behaviors. Full coverage is intentionally not pursued, +as exhaustive testing would not reflect real-world reliability. + ## License BSD-3-Clause. See `LICENSE` for details. diff --git a/docs/README.en.md b/docs/README.en.md index 1c07081..5ae275f 100644 --- a/docs/README.en.md +++ b/docs/README.en.md @@ -1,15 +1,17 @@ # ExStruct — Excel Structured Extraction Engine -[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) +[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [![codecov](https://codecov.io/gh/harumiWeb/exstruct/graph/badge.svg?token=2XI1O8TTA9)](https://codecov.io/gh/harumiWeb/exstruct) ![ExStruct Image](assets/icon.webp) -ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines. +ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, smartart, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines. + +[日本版 README](README.ja.md) ## Features -- **Excel → Structured JSON**: cells, shapes, charts, table candidates, print areas/views, and auto page-break areas per sheet. -- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled. +- **Excel → Structured JSON**: cells, shapes, charts, smartart, table candidates, print areas/views, and auto page-break areas per sheet. +- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, smartart, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled. - **Auto page-break export (COM only)**: capture Excel-computed auto page breaks and write per-area JSON/YAML/TOON when requested (CLI option appears only when COM is available). - **Formats**: JSON (compact by default, `--pretty` available), YAML, TOON (optional dependencies). - **Table detection tuning**: adjust heuristics at runtime via API. @@ -398,6 +400,12 @@ please read the contributor architecture guide. → [docs/contributors/architecture.md](docs/contributors/architecture.md) +## Note on coverage + +The cell-structure inference logic (cells.py) relies on heuristic rules +and Excel-specific behaviors. Full coverage is intentionally not pursued, +as exhaustive testing would not reflect real-world reliability. + ## License BSD-3-Clause. See `LICENSE` for details. diff --git a/docs/README.ja.md b/docs/README.ja.md index 8b52db3..d3e2676 100644 --- a/docs/README.ja.md +++ b/docs/README.ja.md @@ -1,17 +1,15 @@ # ExStruct — Excel 構造化抽出エンジン -[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) +[![PyPI version](https://badge.fury.io/py/exstruct.svg)](https://pypi.org/project/exstruct/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/exstruct?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/exstruct) ![Licence: BSD-3-Clause](https://img.shields.io/badge/license-BSD--3--Clause-blue?style=flat-square) [![pytest](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml/badge.svg)](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/e081cb4f634e4175b259eb7c34f54f60)](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [![codecov](https://codecov.io/gh/harumiWeb/exstruct/graph/badge.svg?token=2XI1O8TTA9)](https://codecov.io/gh/harumiWeb/exstruct) ![ExStruct Image](assets/icon.webp) -ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。 - -[English README](README.en.md) +ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・SmartArt・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。 ## 主な特徴 -- **Excel → 構造化 JSON**: セル、図形、チャート、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。 -- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。 +- **Excel → 構造化 JSON**: セル、図形、チャート、SmartArt、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。 +- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート、SmartArt)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。 - **フォーマット**: JSON(デフォルトはコンパクト、`--pretty` で整形)、YAML、TOON(任意依存)。 - **テーブル検出のチューニング**: API でヒューリスティックを動的に変更可能。 - **ハイパーリンク抽出**: `verbose` モード(または `include_cell_links=True` 指定)でセルのリンクを `links` に出力。 @@ -341,7 +339,7 @@ flowchart TD ということが明確に示されています。 -その他の本ライブラリを使ったLLM推論サンプルは以下のディレクトリにあります。 +その他の本ライブラリを使った LLM 推論サンプルは以下のディレクトリにあります。 - [Basic Excel](sample/basic/) - [Flowchart](sample/flowchart/) @@ -371,6 +369,7 @@ ExStruct は主に **ライブラリ** として利用される想定で、サ - 企業利用ではフォークや内部改修が前提です 次のようなチームに適しています。 + - ブラックボックス化されたツールではなく、透明性が必要 - 必要に応じて内部フォークを保守できる @@ -402,6 +401,11 @@ ExStruct の内部実装を拡張する場合は、 → [docs/contributors/architecture.md](docs/contributors/architecture.md) +## カバレッジに関する注意 + +セル構造推論ロジック(cells.py)は、ヒューリスティックルールと +Excel 固有の動作に依存しています。網羅的なテストは現実世界の信頼性を反映できないため、完全なカバレッジは意図的に追求されていません。 + ## License BSD-3-Clause. See `LICENSE` for details. diff --git a/docs/agents/CODE_REVIEW.md b/docs/agents/CODE_REVIEW.md index 5053262..e69de29 100644 --- a/docs/agents/CODE_REVIEW.md +++ b/docs/agents/CODE_REVIEW.md @@ -1,779 +0,0 @@ -````md -**Actionable comments posted: 0** - -> [!CAUTION] -> Some comments are outside the diff and can’t be posted inline due to platform limitations. -> ->
-> ⚠️ Outside diff range comments (1)
-> ->
-> src/exstruct/io/__init__.py (1)
-> -> `74-82`: **Return `RangeBounds` directly instead of converting to tuple.** -> -> This wrapper function unpacks the Pydantic `RangeBounds` model into a tuple, which violates the coding guideline: "Do not return dictionaries or tuples; always use Pydantic BaseModel for structured data." Callers should access bounds via the model's fields (`bounds.r1`, `bounds.c1`, etc.) to preserve type safety and semantic clarity. -> -> As per coding guidelines, structured data should be returned as Pydantic models, not tuples. -> ->
-> 🔎 Proposed refactor to eliminate tuple conversion -> -> **Option 1: Remove the wrapper entirely and use `parse_range_zero_based` directly** -> -> Update callers (e.g., line 129) to use the model fields: -> -> ```diff -> def _filter_table_candidates_to_area( -> table_candidates: list[str], area: PrintArea -> ) -> list[str]: -> filtered: list[str] = [] -> for candidate in table_candidates: -> - bounds = _parse_range_zero_based(candidate) -> - if not bounds: -> + bounds = parse_range_zero_based(candidate) -> + if bounds is None: -> continue -> - r1, c1, r2, c2 = bounds -> + r1, c1, r2, c2 = bounds.r1, bounds.c1, bounds.r2, bounds.c2 -> r1 += 1 -> r2 += 1 -> if r1 >= area.r1 and r2 <= area.r2 and c1 >= area.c1 and c2 <= area.c2: -> filtered.append(candidate) -> return filtered -> ``` -> -> **Option 2: If the wrapper is needed, change return type to `RangeBounds | None`** -> -> ```diff -> -def _parse_range_zero_based(range_str: str) -> tuple[int, int, int, int] | None: -> - """ -> - Parse an Excel range string into zero-based (r1, c1, r2, c2) bounds. -> - Returns None on failure. -> - """ -> +def _parse_range_zero_based(range_str: str) -> RangeBounds | None: -> + """Parse an Excel range string into zero-based bounds. -> + -> + Args: -> + range_str: Excel range string (e.g., "Sheet1!A1:B2"). -> + -> + Returns: -> + RangeBounds in zero-based coordinates, or None on failure. -> + """ -> bounds = parse_range_zero_based(range_str) -> - if bounds is None: -> - return None -> - return (bounds.r1, bounds.c1, bounds.r2, bounds.c2) -> + return bounds -> ``` -> ->
-> ->
-> ->
- -
-♻️ Duplicate comments (3)
- -
-src/exstruct/core/pipeline.py (2)
- -`644-645`: **Previous review concern addressed.** - -The error message has been updated to use a generic "COM pipeline failed" message with `FallbackReason.COM_PIPELINE_FAILED`, addressing the earlier review comment about misleading error messages. - ---- - -`670-670`: **Previous review concern addressed.** - -The `reason` parameter is now properly used for logging, addressing the earlier review comment about the unused parameter. - -
-
-docs/agents/TEST_REQUIREMENTS.md (1)
- -`203-204`: **Previous review concern addressed.** - -The duplicate requirement IDs have been resolved by renaming them to `[PIPE-MOD-01]` and `[PIPE-MOD-02]`, making each requirement ID unique throughout the document. - -
- -
- -
-🧹 Nitpick comments (7)
- -
-src/exstruct/io/__init__.py (1)
- -`488-489`: **Consider renaming publicly exported functions to remove underscore prefix.** - -The functions `_require_yaml` and `_require_toon` are now part of the public API (exported in `__all__`), but their names start with an underscore, which by Python convention indicates private/internal functions. For a cleaner public API, consider renaming them to `require_yaml` and `require_toon` (without the underscore). - -
-🔎 Suggested changes for consistent public API naming - -In `src/exstruct/io/serialize.py`, rename the functions: - -```diff --def _require_yaml() -> ModuleType: -+def require_yaml() -> ModuleType: - """Ensure pyyaml is installed; otherwise raise with guidance.""" - ... - --def _require_toon() -> ModuleType: -+def require_toon() -> ModuleType: - """Ensure python-toon is installed; otherwise raise with guidance.""" - ... -``` - -In this file, update the import and export: - -```diff - from .serialize import ( - _FORMAT_HINTS, - _ensure_format_hint, -- _require_toon, -- _require_yaml, -+ require_toon, -+ require_yaml, - _serialize_payload_from_hint, - ) - - __all__ = [ - "dict_without_empty_values", - "save_as_json", - "save_as_yaml", - "save_as_toon", - "save_sheets", - "save_sheets_as_json", - "build_print_area_views", - "save_print_area_views", - "save_auto_page_break_views", - "serialize_workbook", -- "_require_yaml", -- "_require_toon", -+ "require_yaml", -+ "require_toon", - ] -``` - -
- -
-
-tests/test_backends.py (3)
- -`16-22`: **Add type hints to mock functions.** - -The fake functions should have explicit type hints for maintainability and mypy compliance. - -
-🔎 Proposed refactor - -```diff -- def fake_cells(_: Path) -> dict[str, list[object]]: -+ def fake_cells(file_path: Path) -> dict[str, list[object]]: - calls.append("cells") - return {} - -- def fake_cells_links(_: Path) -> dict[str, list[object]]: -+ def fake_cells_links(file_path: Path) -> dict[str, list[object]]: - calls.append("links") - return {} -``` - -
- -As per coding guidelines, avoid using `_` for actual parameters; use descriptive names with proper type hints. - ---- - -`43-44`: **Use explicit parameter names with type hints.** - -Replace generic `_` and `__` with descriptive parameter names for better readability. - -
-🔎 Proposed refactor - -```diff -- def fake_detect(_: Path, __: str) -> list[str]: -+ def fake_detect(file_path: Path, sheet_name: str) -> list[str]: - raise RuntimeError("boom") -``` - -
- -As per coding guidelines, use descriptive parameter names. - ---- - -`58-59`: **Use explicit type hints instead of generic object.** - -The mock function should use proper type signatures for clarity. - -
-🔎 Proposed refactor - -```diff -- def fake_colors_map(*_: object, **__: object) -> object: -+ def fake_colors_map( -+ workbook: object, -+ *, -+ include_default_background: bool, -+ ignore_colors: set[str] | None -+ ) -> object: - raise RuntimeError("boom") -``` - -
- -As per coding guidelines, provide explicit type hints for all parameters. - -
-
-src/exstruct/core/backends/openpyxl_backend.py (1)
- -`103-125`: **Consider adding a module-level docstring.** - -The file contains well-structured code with proper docstrings for classes and methods, but lacks a module-level docstring explaining the openpyxl backend's role in the extraction pipeline. - -
-🔎 Suggested addition - -Add at the top of the file after imports: - -```python -"""Openpyxl-based backend for Excel workbook extraction. - -This module provides the OpenpyxlBackend class which uses the openpyxl library -to extract cells, print areas, color maps, and table candidates from Excel files. -It serves as the primary extraction backend with fallback support when COM is unavailable. -""" -``` - -
- -As per coding guidelines, adding comprehensive module documentation improves maintainability. - -
-
-src/exstruct/core/backends/com_backend.py (2)
- -`173-206`: **Consider extracting CSV parsing to a utility function for reusability.** - -The `_split_csv_respecting_quotes` function implements custom CSV parsing with quote handling. While the implementation is correct, this type of utility might be useful elsewhere in the codebase. - -Consider moving this to a shared utilities module if similar parsing is needed elsewhere, or documenting that Python's `csv` module with appropriate dialect settings could be an alternative. - ---- - -`1-206`: **Add module-level docstring for better documentation.** - -The file contains well-structured code with proper class and method docstrings, but lacks a module-level docstring explaining the COM backend's role. - -
-🔎 Suggested addition - -Add at the top after imports: - -```python -"""COM-based backend for Excel workbook extraction via xlwings. - -This module provides the ComBackend class which uses xlwings/COM to extract -print areas, color maps, and auto page breaks from Excel files. It complements -the OpenpyxlBackend and is used when COM automation is available and required -features (like auto page breaks) need native Excel API access. -""" -``` - -
- -As per coding guidelines, comprehensive module documentation improves maintainability. - -
- -
- -
-📜 Review details - -**Configuration used**: defaults - -**Review profile**: CHILL - -**Plan**: Pro - -
-📥 Commits - -Reviewing files that changed from the base of the PR and between 79d4bbb797caaa2641a3581e2d56059fc5a8cbf2 and 93b10e2d8b0e9e4e339f038cb215951dd4fdd7c2. - -
- -
-📒 Files selected for processing (20) - -- `.github/workflows/pytest.yml` -- `docs/agents/CODE_REVIEW.md` -- `docs/agents/DATA_MODEL.md` -- `docs/agents/EXCEL_EXTRACTION.md` -- `docs/agents/TEST_REQUIREMENTS.md` -- `src/exstruct/core/backends/com_backend.py` -- `src/exstruct/core/backends/openpyxl_backend.py` -- `src/exstruct/core/pipeline.py` -- `src/exstruct/errors.py` -- `src/exstruct/io/__init__.py` -- `src/exstruct/models/__init__.py` -- `tests/test_auto_page_breaks.py` -- `tests/test_backends.py` -- `tests/test_engine.py` -- `tests/test_integrate_raw_data.py` -- `tests/test_modeling.py` -- `tests/test_models_validation.py` -- `tests/test_pipeline.py` -- `tests/test_print_area_views.py` -- `tests/test_print_areas_openpyxl.py` - -
- -
-🚧 Files skipped from review as they are similar to previous changes (4) - -- tests/test_pipeline.py -- tests/test_modeling.py -- tests/test_integrate_raw_data.py -- src/exstruct/errors.py - -
- -
-🧰 Additional context used - -
-📓 Path-based instructions (1) - -
-**/*.py - -**📄 CodeRabbit inference engine (AGENTS.md)** - -> `**/*.py`: Add type hints to all function and method arguments and return values (mypy strict compliance) -> Use `Any` type only at external library boundaries (xlwings, pandas, numpy, etc.) -> Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data -> Ensure each function has a single responsibility (cyclomatic complexity should not exceed 12) -> Organize imports in this order: (1) standard library, (2) third-party packages, (3) exstruct internal modules -> Use Google-style docstrings for all functions and classes -> Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external data to Pydantic models at boundaries -> Avoid writing God Functions (large, single-responsibility-violating functions) -> Avoid writing God Objects (classes with too many responsibilities) -> Avoid excessive nesting and deep conditional branching -> Do not return dictionaries or tuples; always use Pydantic BaseModel for structured data -> Ensure code passes mypy strict mode with zero errors -> Ensure code passes Ruff checks (E, W, F, I, B, UP, N, C90) with zero errors -> Avoid circular dependencies between modules -> AI should propose automatic function splitting when functions are too complex -> AI should propose additional Pydantic models when data structures are insufficient -> AI should automatically fix and organize imports when they are inconsistent -> AI should improve docstrings when they are incomplete or insufficient - -Files: - -- `tests/test_print_areas_openpyxl.py` -- `tests/test_backends.py` -- `src/exstruct/core/backends/openpyxl_backend.py` -- `src/exstruct/io/__init__.py` -- `src/exstruct/core/backends/com_backend.py` -- `tests/test_print_area_views.py` -- `tests/test_engine.py` -- `tests/test_auto_page_breaks.py` -- `src/exstruct/core/pipeline.py` -- `tests/test_models_validation.py` -- `src/exstruct/models/__init__.py` - -
- -
-🧠 Learnings (1) - -
-📚 Learning: 2025-12-21T10:55:22.238Z - -``` -Learnt from: CR -Repo: harumiWeb/exstruct PR: 0 -File: AGENTS.md:0-0 -Timestamp: 2025-12-21T10:55:22.238Z -Learning: Applies to **/*.py : Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external data to Pydantic models at boundaries -``` - -**Applied to files:** - -- `src/exstruct/io/__init__.py` - -
- -
-🧬 Code graph analysis (6) - -
-tests/test_backends.py (3)
- -
-src/exstruct/core/backends/com_backend.py (3) - -- `ComBackend` (18-135) -- `extract_colors_map` (51-74) -- `extract_print_areas` (27-49) - -
-
-src/exstruct/core/backends/openpyxl_backend.py (5) - -- `OpenpyxlBackend` (23-100) -- `extract_cells` (32-45) -- `detect_tables` (88-100) -- `extract_colors_map` (64-86) -- `extract_print_areas` (47-62) - -
-
-src/exstruct/core/ranges.py (1) - -- `parse_range_zero_based` (24-47) - -
- -
-
-src/exstruct/core/backends/openpyxl_backend.py (3)
- -
-src/exstruct/models/__init__.py (1) - -- `PrintArea` (97-103) - -
-
-src/exstruct/core/ranges.py (1) - -- `parse_range_zero_based` (24-47) - -
-
-src/exstruct/core/workbook.py (1) - -- `openpyxl_workbook` (14-53) - -
- -
-
-src/exstruct/io/__init__.py (3)
- -
-src/exstruct/core/ranges.py (1) - -- `parse_range_zero_based` (24-47) - -
-
-src/exstruct/errors.py (2) - -- `OutputError` (32-33) -- `SerializationError` (20-21) - -
-
-src/exstruct/io/serialize.py (4) - -- `_ensure_format_hint` (28-49) -- `_require_toon` (104-112) -- `_require_yaml` (93-101) -- `_serialize_payload_from_hint` (52-90) - -
- -
-
-tests/test_print_area_views.py (1)
- -
-src/exstruct/models/__init__.py (2) - -- `CellRow` (50-59) -- `PrintArea` (97-103) - -
- -
-
-tests/test_auto_page_breaks.py (1)
- -
-src/exstruct/models/__init__.py (1) - -- `PrintArea` (97-103) - -
- -
-
-tests/test_models_validation.py (1)
- -
-src/exstruct/models/__init__.py (1) - -- `CellRow` (50-59) - -
- -
- -
-🪛 GitHub Check: Codacy Static Code Analysis - -
-src/exstruct/core/backends/com_backend.py - -[warning] 37-37: src/exstruct/core/backends/com_backend.py#L37 -Try, Except, Continue detected. - ---- - -[warning] 132-132: src/exstruct/core/backends/com_backend.py#L132 -Try, Except, Pass detected. - -
- -
- -
- -
-⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2) - -- GitHub Check: Codacy Static Code Analysis -- GitHub Check: test (windows-latest, 3.11) - -
- -
-🔇 Additional comments (24)
- -
-docs/agents/DATA_MODEL.md (2)
- -`96-96`: **Clear coordinate semantics for PrintArea rows—good clarification.** - -Adding "(1-based, inclusive)" to `r1` and `r2` aligns with the existing `c1` and `c2` documentation and makes the row coordinate interpretation explicit. This improves clarity for API consumers. - -Also applies to: 98-98 - ---- - -`140-140`: **Documentation for `colors_map` coordinate system is correct and matches implementation.** - -The codebase confirms that `colors_map` uses (row=1-based, col=0-based) coordinates as documented. Both extraction paths—openpyxl (`cell.row, cell.col_idx - 1`) and COM (`row, col - 1`)—apply the same logic: rows are preserved as 1-based, columns are converted from 1-based to 0-based. Test coverage validates this with `{"#FFFFFF": [(1, 0)]}`, confirming row 1 with column 0. - -
-
-src/exstruct/io/__init__.py (3)
- -`8-18`: **Well done removing the OpenPyXL dependency!** - -The refactor successfully eliminates direct dependency on `openpyxl.utils.cell.range_boundaries` by using the internal `parse_range_zero_based` function, and centralizes serialization logic through the new helpers from `.serialize`. This aligns with the coding guideline to avoid depending on internal structures of external libraries. - -Based on learnings, this change properly normalizes external data to Pydantic models at boundaries. - ---- - -`283-288`: **Excellent serialization refactoring!** - -The centralization of format validation and serialization through `_ensure_format_hint` and `_serialize_payload_from_hint` eliminates code duplication and provides a single source of truth for format handling. This improves maintainability and consistency across all export functions. - -Also applies to: 315-318, 341-346, 374-377, 393-402, 429-432, 449-454, 469-471 - ---- - -`133-134`: **Row index adjustments correctly implement 1-based row coordinates.** - -The adjustments on lines 133-134 (adding 1 to `r1` and `r2`) and lines 148, 150 (subtracting 1 from `area.r1` in pixel calculations) correctly convert between the 0-based coordinates returned by `parse_range_zero_based` and the 1-based row coordinates used in `PrintArea`, as documented in the PR objectives. - -Also applies to: 148-148, 150-150 - -
-
-tests/test_models_validation.py (1)
- -`19-19`: **LGTM: Coordinate system alignment.** - -The update from `r=0` to `r=1` correctly reflects the 1-based row indexing documented in the `CellRow` model definition. - -
-
-tests/test_auto_page_breaks.py (1)
- -`47-47`: **LGTM: Coordinate system alignment.** - -The updated PrintArea coordinates correctly reflect the 1-based row indexing (r1=1, r2=2) and 0-based column indexing (c1=0, c2=1) documented in the PrintArea model. - -
-
-.github/workflows/pytest.yml (1)
- -`34-46`: **LGTM: Coverage collection and Codecov integration properly configured.** - -The pytest coverage flags and Codecov upload step are correctly implemented. The Codecov action is now properly pinned to a full commit SHA, addressing the previous security concern. - -
-
-src/exstruct/models/__init__.py (3)
- -`14-16`: **LGTM: Field definition reflow.** - -The Shape.id field definition has been reflowed for readability while maintaining the same semantic meaning. - ---- - -`101-103`: **LGTM: Coordinate system documentation clarified.** - -The field descriptions now explicitly document that columns use 0-based indexing, improving clarity for API consumers. - ---- - -`129-132`: **LGTM: colors_map coordinate semantics documented.** - -The description now explicitly clarifies that rows are 1-based and columns are 0-based, which is essential for correct usage of the colors_map field. - -
-
-docs/agents/EXCEL_EXTRACTION.md (1)
- -`1-66`: **LGTM: Comprehensive pipeline documentation.** - -This documentation provides clear guidance on the extraction pipeline architecture, coordinate system semantics, modes, and fallback behavior. The coordinate system clarification (rows 1-based, columns 0-based) is consistent with the code changes throughout the PR. - -
-
-src/exstruct/core/pipeline.py (1)
- -`1-696`: **LGTM: Pipeline architecture with proper type safety.** - -The pipeline module demonstrates excellent adherence to coding guidelines: - -- Complete type hints on all functions and parameters -- Google-style docstrings throughout -- Immutable dataclasses for pipeline configuration and state -- Well-organized imports (stdlib → third-party → internal) -- Clear separation of pre-COM and COM extraction steps - -The architecture provides a solid foundation for the extraction workflow with explicit fallback handling and state tracking. - -
-
-docs/agents/TEST_REQUIREMENTS.md (1)
- -`1-228`: **LGTM: Comprehensive test requirements specification.** - -The test requirements document provides thorough coverage of functional, non-functional, and integration requirements. The organization by category (pipeline, backend, ranges, etc.) aligns well with the modular architecture introduced in this PR. - -
-
-tests/test_print_areas_openpyxl.py (1)
- -`26-26`: **LGTM: Coordinate system alignment.** - -The assertion now correctly expects 1-based row coordinates (r1=1, r2=2) and 0-based column coordinates (c1=0, c2=1), consistent with the PrintArea model definition and the broader coordinate system updates in this PR. - -
-
-tests/test_print_area_views.py (1)
- -`39-46`: **LGTM! Coordinate system update correctly applied.** - -The test data has been properly updated to reflect the documented coordinate convention (1-based rows, 0-based columns). The CellRow indices and PrintArea bounds are now consistent with the model definitions. - -
-
-tests/test_backends.py (2)
- -`76-93`: **LGTM! Print area extraction test validates coordinate system.** - -The test correctly verifies that print areas extracted via openpyxl use 1-based row indexing (r1=1) and 0-based column indexing (c1=0), consistent with the PrintArea model definition. - ---- - -`95-101`: **LGTM! Range parsing test validates zero-based intermediate representation.** - -The test correctly verifies that `parse_range_zero_based` returns zero-based coordinates (r1=0, c1=0, r2=1, c2=1) for the range "Sheet1!A1:B2", which are then converted to the appropriate coordinate system by backend methods. - -
-
-src/exstruct/core/backends/openpyxl_backend.py (1)
- -`159-162`: **LGTM! Coordinate conversion correctly implemented.** - -The conversion from zero-based coordinates (from `_parse_print_area_range`) to the PrintArea model is now correct: - -- Rows (r1, r2): converted from 0-based to 1-based with `+ 1` -- Columns (c1, c2): kept as 0-based (no adjustment) - -This matches the PrintArea model definition where rows are 1-based and columns are 0-based. - -
-
-tests/test_engine.py (2)
- -`6-12`: **LGTM! Improved API design with nested option groups.** - -The introduction of `FilterOptions` and `DestinationOptions` improves the API's organization and clarity, grouping related settings together under `OutputOptions`. - ---- - -`65-65`: **LGTM! PrintArea coordinate updated correctly.** - -The PrintArea coordinates have been updated to use 1-based row indexing (r1=1) consistent with the model definition and broader coordinate system clarification in this PR. - -
-
-docs/agents/CODE_REVIEW.md (1)
- -`1-609`: **Documentation file - no code review needed.** - -This file contains example code review comments and AI agent prompts for documentation purposes. It does not contain production code requiring review. - -
-
-src/exstruct/core/backends/com_backend.py (2)
- -`45-48`: **LGTM! Coordinate conversion correctly implemented.** - -The conversion from zero-based coordinates (from `_parse_print_area_range`) to PrintArea is correct: - -- Rows (r1, r2): `+ 1` converts 0-based to 1-based -- Columns (c1, c2): no adjustment keeps 0-based - -This matches the PrintArea model definition. - ---- - -`119-125`: **LGTM! Coordinate conversion for COM values is correct.** - -The coordinate handling here is actually correct, contrary to the past review comment: - -- `rows[i]` and `cols[j]` are 1-based values from Excel COM -- Rows (r1, r2): used as-is since PrintArea expects 1-based rows ✓ -- Columns (c1, c2): `- 1` converts from 1-based COM to 0-based for PrintArea ✓ - -This is consistent with the PrintArea model where rows are 1-based and columns are 0-based. - -
- -
- -
- - -```` diff --git a/docs/agents/DATA_MODEL.md b/docs/agents/DATA_MODEL.md index 67a9c49..3ed8c7f 100644 --- a/docs/agents/DATA_MODEL.md +++ b/docs/agents/DATA_MODEL.md @@ -1,6 +1,6 @@ # ExStruct データモデル仕様 -**Version**: 0.10 +**Version**: 0.13 **Status**: Authoritative — 本ドキュメントは ExStruct が返す全モデルの唯一の正準ソースです。 core / io / integrate は必ずこの仕様に従うこと。モデルは **pydantic v2** で実装します。 @@ -13,32 +13,53 @@ ExStruct は Excel ワークブックを LLM が扱いやすい **意味構造 --- -# 2. Shape Model +# 2. Shape / Arrow / SmartArt Model + +出力の `shapes` は下記 3 モデルのユニオンです。`kind` で判別します。 ```jsonc -Shape { - id: int | null // sheet 内での通番 id(線・矢印は null の場合あり) +BaseShape { + id: int | null // sheet 内の通番 id(線/矢印は null の場合あり) text: str l: int // left (px) t: int // top (px) w: int | null // width (px) h: int | null // height(px) - type: str | null // MSO 図形タイプのラベル rotation: float | null +} + +Shape extends BaseShape { + kind: "shape" + type: str | null // MSO 図形タイプラベル +} + +Arrow extends BaseShape { + kind: "arrow" begin_arrow_style: int | null end_arrow_style: int | null begin_id: int | null // コネクタ始点の接続先 Shape.id end_id: int | null // コネクタ終点の接続先 Shape.id direction: "E"|"SE"|"S"|"SW"|"W"|"NW"|"N"|"NE" | null } + +SmartArtNode { + text: str + kids: [SmartArtNode] +} + +SmartArt extends BaseShape { + kind: "smartart" + layout: str + nodes: [SmartArtNode] +} ``` 補足: - `direction` は線や矢印の向きを 8 方位に正規化したもの。 - 矢印スタイルは Excel の enum に対応。 -- `begin_id` / `end_id` は、コネクタが接続している図形の `id`(Excel の `ConnectorFormat.BeginConnectedShape` / `EndConnectedShape` に対応)。 -- 線や矢印の Shape では `id` が null になる場合があります。 +- `begin_id` / `end_id` は、コネクタが接続している図形の `id` に対応(`ConnectorFormat.BeginConnectedShape` / `EndConnectedShape`)。 +- `SmartArtNode` はネスト構造で表現し、`nodes` がツリーの根。 --- @@ -114,7 +135,7 @@ PrintAreaView { book_name: str sheet_name: str area: PrintArea - shapes: [Shape] + shapes: [Shape | Arrow | SmartArt] charts: [Chart] rows: [CellRow] // 範囲に交差する行のみ、空列は落とす table_candidates: [str] // 範囲内に収まるテーブル候補 @@ -132,7 +153,7 @@ PrintAreaView { ```jsonc SheetData { rows: [CellRow] - shapes: [Shape] + shapes: [Shape | Arrow | SmartArt] charts: [Chart] table_candidates: [str] print_areas: [PrintArea] @@ -204,3 +225,4 @@ WorkbookData { - 0.10: Shape に `id` を追加し、コネクタの接続元/接続先を `id` 参照に変更し、`name` をペイロードから除去。 - 0.11: コネクタのフィールド名を `begin_id` / `end_id` にリネーム。 - 0.12: SheetData に背景色情報を格納する`colors_map`を追加。 +- 0.13: Shape を `Shape` / `Arrow` / `SmartArt` に分離し、`SmartArtNode` のネスト構造を追加。 diff --git a/docs/agents/EXCEL_EXTRACTION.md b/docs/agents/EXCEL_EXTRACTION.md index fa0d361..92ed0a3 100644 --- a/docs/agents/EXCEL_EXTRACTION.md +++ b/docs/agents/EXCEL_EXTRACTION.md @@ -32,14 +32,15 @@ - openpyxl のテーブル定義 + 罫線クラスターを統合 - COM が使えない場合でも table_candidates を維持 -## Shapes +## Shapes / Arrows / SmartArt 抽出内容: -- Type / AutoShapeType の正規化 +- Type / AutoShapeType の正規化(`type` は Shape のみ) - Left/Top/Width/Height - TextFrame2.TextRange.Text - 矢印方向や接続情報 +- SmartArt の layout/nodes/kids(ネスト構造) ## Charts diff --git a/docs/agents/FEATURE_SPEC.md b/docs/agents/FEATURE_SPEC.md index 12955e5..f512aa7 100644 --- a/docs/agents/FEATURE_SPEC.md +++ b/docs/agents/FEATURE_SPEC.md @@ -11,12 +11,6 @@ - SmartArtは基本はShapeのフィールドを持ちつつ、Nodeの情報を再帰的に持つようにする - rootノードとそれ以外のノードでクラスを分ける -## リファクタリング案 - -- リソース取得の冗長性 - - 事象: 印刷範囲取得が openpyxl→COM のようにロジックがファイル内に分散。似たパターンが他にもある。 - - 対策案: 抽出パイプラインをステップ化し、各ステップ(cells, tables, shapes, charts, print_areas)の実装をモジュール単位で揃える。パイプライン定義を 1 か所にまとめるとモード追加や切替が容易になる。 - ## 今後のオプション(検討メモ) - 表検出スコアリングの閾値を CLI/環境変数で調整可能にする。 diff --git a/docs/agents/OVERVIEW.md b/docs/agents/OVERVIEW.md index 4e42d80..698dbd1 100644 --- a/docs/agents/OVERVIEW.md +++ b/docs/agents/OVERVIEW.md @@ -16,7 +16,7 @@ openpyxl と Excel COM(xlwings)を組み合わせ、LLM が扱いやすい - Cells(値/リンク/座標) - Tables(候補範囲) -- Shapes(位置/種類/テキスト/矢印) +- Shapes / Arrows / SmartArt(位置/テキスト/矢印/レイアウト) - Charts(Series/Axis/Type/Title) - Print Areas / Auto Page Breaks - Colors Map(条件付き書式を含む) @@ -24,7 +24,7 @@ openpyxl と Excel COM(xlwings)を組み合わせ、LLM が扱いやすい ## 利用例(概要) - `extract(path, mode="standard")` で WorkbookData を取得 -- `process_excel` でファイル出力やディレクトリ分割 +- `process_excel` でファイル出力やディレクトリ出力 - CLI で `exstruct file.xlsx --format json` を利用 ## ディレクトリ構成(概要) diff --git a/docs/agents/ROADMAP.md b/docs/agents/ROADMAP.md index 47b77a0..f13ad1d 100644 --- a/docs/agents/ROADMAP.md +++ b/docs/agents/ROADMAP.md @@ -33,11 +33,11 @@ ## v0.3.1 -- ShapesとArrowsの分離(後のSmartArt追加のため) +- Shapes と Arrows の分離(後の SmartArt 追加のため) +- SmartArt 解析 ## v0.4.0 -- SmartArt 解析 - Excel Form Controls 解析 ## v1.0.0 diff --git a/docs/agents/TASKS.md b/docs/agents/TASKS.md index 66cc050..767d666 100644 --- a/docs/agents/TASKS.md +++ b/docs/agents/TASKS.md @@ -1,8 +1,34 @@ # Task List -未完了タスクは [ ]、完了タスクは [x] +## 1. 既存実装の修正(モデル分離の影響対応) -- [x] src/exstruct/render/__init__.py の主要分岐と例外経路を洗い出す(_require_excel_app/_require_pdfium/export_pdf/export_sheet_images/_sanitize_sheet_filename) -- [x] xlwings・pypdfium2 をモックして export_pdf/export_sheet_images の成功/失敗ケースを単体テスト化する -- [x] 依存不足・予期例外のエラーメッセージ/例外型を検証するテストを追加する -- [x] シート名のサニタイズ規則と出力ファイル名生成のテストを追加する +- [x] `src/exstruct/io/__init__.py` の `_filter_shapes_to_area` が `list[Shape | Arrow | SmartArt]` を受け取れるように型と処理を調整する +- [x] `src/exstruct/core/shapes.py` のコネクタ判定を `Arrow` 前提に変更する(`begin_arrow_style` / `end_arrow_style` などは `Arrow` のみ参照) +- [x] `src/exstruct/core/shapes.py` の接続 ID 参照を `Arrow` に限定し、`Shape` からの誤参照を除去する +- [x] `PrintAreaView` 側の `shapes` フィルタで `SmartArt` を落とさないことを確認する + +## 2. SmartArt 取得機能の実装方針 + +- [x] `shape.HasSmartArt` を条件に SmartArt を抽出する +- [x] `SmartArt.Layout.Name` を `SmartArt.layout` に格納する +- [x] `SmartArt.AllNodes` を走査し、`level` と `text` を収集する +- [x] ノード配列から `SmartArtNode` のツリー(`nodes`)を構築する(`level` を使ったスタック組み立て) +- [x] `SmartArt` は `BaseShape` 相当の位置/サイズ/回転/テキストを併せて格納する + +## 3. 実装箇所の整理 + +- [x] `src/exstruct/core/shapes.py` に SmartArt 抽出用の関数を追加する(1 関数=1 責務を遵守) +- [x] `src/exstruct/core/shapes.py` のメイン抽出処理で `Shape` / `Arrow` / `SmartArt` に振り分ける +- [x] `src/exstruct/io/__init__.py` で `Shape | Arrow | SmartArt` のシリアライズ挙動が崩れないことを確認する + +## 4. 動作確認 + +- [x] 既存の shape / connector 抽出が壊れていないことを確認する +- [ ] SmartArt が含まれるブックで `SmartArt.nodes` が期待どおりに出力されることを確認する + +## 5. テストケース(カバレッジ維持) + +- [x] `SmartArt` の `nodes` がネスト構造でシリアライズされることを確認する +- [x] `Arrow` のみが `begin_id` / `end_id` を持ち、`Shape` では参照されないことを確認する +- [x] `_filter_shapes_to_area` が `Shape | Arrow | SmartArt` を受け取り、SmartArt も対象に含めることを確認する +- [x] `kind` による判別が想定どおり動くことを確認する diff --git a/docs/agents/TEST_REQUIREMENTS.md b/docs/agents/TEST_REQUIREMENTS.md index 83cde0e..05dd053 100644 --- a/docs/agents/TEST_REQUIREMENTS.md +++ b/docs/agents/TEST_REQUIREMENTS.md @@ -1,6 +1,6 @@ # ExStruct テスト要件仕様書 -Version: 0.3 +Version: 0.4 Status: Required for Release ExStruct の全機能について、正式なテスト要件をまとめたドキュメントです。AI エージェント/人間開発者が自動テスト・手動テストを設計するための基盤とします。 @@ -51,6 +51,7 @@ ExStruct の全機能について、正式なテスト要件をまとめたド - [SHP-01] AutoShape の type を正規化 - [SHP-02] TextFrame を正しく取得 +- [SHP-02a] `type` は Shape のみ保持し、Arrow/SmartArt では出力しない - [SHP-03] サイズ `w`,`h` は取得できない場合のみ null - [SHP-04] グループ図形は展開方針を一貫させる - [SHP-05] 座標 `l`,`t` は整数で取得しズームの影響を受けない @@ -60,6 +61,13 @@ ExStruct の全機能について、正式なテスト要件をまとめたド - [SHP-11] テキストなし図形は text="" - [SHP-12] 複数段落のテキストも取得 +## 2.2.1 SmartArt 抽出 + +- [SHP-SA-01] SmartArt は `layout` を必須で出力する +- [SHP-SA-02] SmartArt のノードは `nodes` にネスト構造で出力する +- [SHP-SA-03] ノードの子は `kids` で表現する(level は出力しない) +- [SHP-SA-04] SmartArt が存在する場合は `kind="smartart"` で判別できる + ## 2.3 矢印方向推定 - [DIR-01] 0° ±22.5° → "E" diff --git a/docs/concept.md b/docs/concept.md index f308ea8..cb9e739 100644 --- a/docs/concept.md +++ b/docs/concept.md @@ -108,7 +108,7 @@ For RAG and AI systems, this missing structure becomes a major bottleneck. ExStruct outputs a unified structure containing: - cells, rows, and sheets -- shapes and text blocks +- shapes, arrows, and SmartArt nodes (nested) - chart series and metadata - automatically detected table candidates - layout geometry (positions, sizes) diff --git a/docs/schemas.md b/docs/schemas.md index e14f467..39c3573 100644 --- a/docs/schemas.md +++ b/docs/schemas.md @@ -11,6 +11,9 @@ repository to access the raw files. - `schemas/sheet.json` — `SheetData` - `schemas/cell_row.json` — `CellRow` - `schemas/shape.json` — `Shape` +- `schemas/arrow.json` `Arrow` +- `schemas/smartart.json` `SmartArt` +- `schemas/smartart_node.json` `SmartArtNode` - `schemas/chart.json` — `Chart` - `schemas/chart_series.json` — `ChartSeries` - `schemas/print_area.json` — `PrintArea` diff --git a/sample/basic/sample.json b/sample/basic/sample.json index a2c72fb..d64a4d1 100644 --- a/sample/basic/sample.json +++ b/sample/basic/sample.json @@ -5,66 +5,31 @@ "rows": [ { "r": 3, - "c": { - "1": "月", - "2": "製品A", - "3": "製品B", - "4": "製品C" - } + "c": { "1": "月", "2": "製品A", "3": "製品B", "4": "製品C" } }, { "r": 4, - "c": { - "1": "2025-01-01 00:00:00", - "2": 120, - "3": 80, - "4": 60 - } + "c": { "1": "2025-01-01 00:00:00", "2": 120, "3": 80, "4": 60 } }, { "r": 5, - "c": { - "1": "2025-02-01 00:00:00", - "2": 135, - "3": 90, - "4": 64 - } + "c": { "1": "2025-02-01 00:00:00", "2": 135, "3": 90, "4": 64 } }, { "r": 6, - "c": { - "1": "2025-03-01 00:00:00", - "2": 150, - "3": 100, - "4": 70 - } + "c": { "1": "2025-03-01 00:00:00", "2": 150, "3": 100, "4": 70 } }, { "r": 7, - "c": { - "1": "2025-04-01 00:00:00", - "2": 170, - "3": 110, - "4": 72 - } + "c": { "1": "2025-04-01 00:00:00", "2": 170, "3": 110, "4": 72 } }, { "r": 8, - "c": { - "1": "2025-05-01 00:00:00", - "2": 160, - "3": 120, - "4": 75 - } + "c": { "1": "2025-05-01 00:00:00", "2": 160, "3": 120, "4": 75 } }, { "r": 9, - "c": { - "1": "2025-06-01 00:00:00", - "2": 180, - "3": 130, - "4": 80 - } + "c": { "1": "2025-06-01 00:00:00", "2": 180, "3": 130, "4": 80 } } ], "shapes": [ @@ -73,6 +38,7 @@ "text": "開始", "l": 148, "t": 220, + "kind": "shape", "type": "AutoShape-FlowchartProcess" }, { @@ -80,12 +46,13 @@ "text": "入力データ読み込み", "l": 132, "t": 282, + "kind": "shape", "type": "AutoShape-FlowchartProcess" }, { "l": 193, "t": 246, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 1, @@ -97,6 +64,7 @@ "text": "形式は正しい?", "l": 90, "t": 342, + "kind": "shape", "type": "AutoShape-FlowchartDecision" }, { @@ -104,6 +72,7 @@ "text": "1件処理", "l": 424, "t": 361, + "kind": "shape", "type": "AutoShape-FlowchartProcess" }, { @@ -111,12 +80,13 @@ "text": "残件あり?", "l": 365, "t": 414, + "kind": "shape", "type": "AutoShape-FlowchartDecision" }, { "l": 192, "t": 312, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 2, @@ -126,7 +96,7 @@ { "l": 295, "t": 374, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 3, @@ -138,12 +108,13 @@ "text": "はい", "l": 340, "t": 362, + "kind": "shape", "type": "TextBox-Rectangle" }, { "l": 468, "t": 387, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 4, @@ -155,6 +126,7 @@ "text": "出力を生成", "l": 426, "t": 494, + "kind": "shape", "type": "AutoShape-FlowchartProcess" }, { @@ -162,6 +134,7 @@ "text": "メール送信?", "l": 366, "t": 549, + "kind": "shape", "type": "AutoShape-FlowchartDecision" }, { @@ -169,12 +142,13 @@ "text": "エラー表示", "l": 132, "t": 463, + "kind": "shape", "type": "AutoShape-FlowchartProcess" }, { "l": 192, "t": 406, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 3, @@ -186,12 +160,13 @@ "text": "メール送信", "l": 426, "t": 638, + "kind": "shape", "type": "AutoShape-FlowchartProcess" }, { "l": 468, "t": 466, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 5, @@ -201,7 +176,7 @@ { "l": 468, "t": 520, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 7, @@ -213,12 +188,13 @@ "text": "終了", "l": 273, "t": 684, + "kind": "shape", "type": "AutoShape-FlowchartProcess" }, { "l": 194, "t": 493, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 9, @@ -228,7 +204,7 @@ { "l": 363, "t": 664, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 10, @@ -238,7 +214,7 @@ { "l": 468, "t": 598, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 8, @@ -250,12 +226,13 @@ "text": "はい", "l": 448, "t": 604, + "kind": "shape", "type": "TextBox-Rectangle" }, { "l": 323, "t": 573, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 8, @@ -266,6 +243,7 @@ "text": "いいえ", "l": 319, "t": 600, + "kind": "shape", "type": "TextBox-Rectangle" } ], @@ -274,10 +252,7 @@ "name": "Chart 1", "chart_type": "Line", "title": "売上データ", - "y_axis_range": [ - 0.0, - 200.0 - ], + "y_axis_range": [0.0, 200.0], "series": [ { "name": "製品A", @@ -302,9 +277,7 @@ "t": 25 } ], - "table_candidates": [ - "B3:E9" - ] + "table_candidates": ["B3:E9"] } } -} \ No newline at end of file +} diff --git a/sample/flowchart/sample-shape-connector.json b/sample/flowchart/sample-shape-connector.json index f1d0f90..b91c9b7 100644 --- a/sample/flowchart/sample-shape-connector.json +++ b/sample/flowchart/sample-shape-connector.json @@ -6,59 +6,65 @@ { "id": 1, "text": "S", - "l": 81, + "l": 80, "t": 45, + "kind": "shape", "type": "AutoShape-Oval" }, { "id": 2, "text": "E", - "l": 549, + "l": 545, "t": 696, + "kind": "shape", "type": "AutoShape-Oval" }, { "id": 3, "text": "要件抽出", - "l": 81, + "l": 80, "t": 168, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "l": 102, - "t": 87, - "type": "AutoShape-Mixed", + "t": 88, + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 1, "end_id": 3, - "direction": "NE" + "direction": "N" }, { "id": 4, "text": "ヒアリング", - "l": 342, + "l": 340, "t": 97, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 5, "text": "非機能要件", - "l": 210, - "t": 225, + "l": 209, + "t": 226, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 6, "text": "機能要件", - "l": 405, + "l": 402, "t": 210, + "kind": "shape", "type": "AutoShape-Rectangle" }, { - "l": 191, + "l": 190, "t": 120, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 3, @@ -66,9 +72,9 @@ "direction": "NE" }, { - "l": 266, + "l": 264, "t": 143, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 4, @@ -76,9 +82,9 @@ "direction": "NE" }, { - "l": 398, + "l": 395, "t": 143, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 4, @@ -88,63 +94,71 @@ { "id": 7, "text": "プロトタイプ", - "l": 381, - "t": 291, + "l": 379, + "t": 292, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 8, "text": "実験検証", - "l": 388, + "l": 385, "t": 389, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 9, "text": "思考実験", - "l": 82, + "l": 81, "t": 325, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 10, "text": "再検証", - "l": 182, + "l": 181, "t": 426, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 11, "text": "まとめ", - "l": 252, - "t": 510, + "l": 251, + "t": 511, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 12, "text": "文書作成", - "l": 296, + "l": 294, "t": 589, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 13, "text": "契約管理", - "l": 489, + "l": 486, "t": 509, + "kind": "shape", "type": "AutoShape-Rectangle" }, { "id": 14, "text": "締結", - "l": 356, - "t": 675, + "l": 353, + "t": 676, + "kind": "shape", "type": "AutoShape-Rectangle" }, { - "l": 144, + "l": 143, "t": 271, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 5, @@ -152,9 +166,9 @@ "direction": "NE" }, { - "l": 144, + "l": 143, "t": 371, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 9, @@ -162,9 +176,9 @@ "direction": "NE" }, { - "l": 244, + "l": 242, "t": 471, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 10, @@ -172,9 +186,9 @@ "direction": "NE" }, { - "l": 314, + "l": 312, "t": 556, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 11, @@ -182,9 +196,9 @@ "direction": "NE" }, { - "l": 376, + "l": 373, "t": 531, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 11, @@ -192,9 +206,9 @@ "direction": "E" }, { - "l": 357, + "l": 355, "t": 635, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 12, @@ -202,9 +216,9 @@ "direction": "NE" }, { - "l": 417, + "l": 414, "t": 554, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 13, @@ -212,9 +226,9 @@ "direction": "NE" }, { - "l": 479, + "l": 476, "t": 698, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 14, @@ -222,9 +236,9 @@ "direction": "E" }, { - "l": 443, - "t": 255, - "type": "AutoShape-Mixed", + "l": 440, + "t": 256, + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 6, @@ -232,9 +246,9 @@ "direction": "NE" }, { - "l": 443, + "l": 440, "t": 337, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 7, @@ -242,9 +256,9 @@ "direction": "N" }, { - "l": 314, + "l": 312, "t": 434, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 8, @@ -252,10 +266,10 @@ "direction": "NE" }, { - "l": 194, + "l": 192, "t": 298, - "type": "AutoShape-Mixed", "rotation": 90.0, + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 10, @@ -263,9 +277,9 @@ "direction": "NE" }, { - "l": 511, + "l": 508, "t": 308, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 8, @@ -275,14 +289,15 @@ { "id": 15, "text": "機能追加", - "l": 581, + "l": 577, "t": 263, + "kind": "shape", "type": "AutoShape-Rectangle" }, { - "l": 505, + "l": 501, "t": 285, - "type": "AutoShape-Mixed", + "kind": "arrow", "begin_arrow_style": 1, "end_arrow_style": 2, "begin_id": 15, @@ -292,4 +307,4 @@ ] } } -} \ No newline at end of file +} diff --git a/sample/smartart/sample_smartart.json b/sample/smartart/sample_smartart.json new file mode 100644 index 0000000..07f112a --- /dev/null +++ b/sample/smartart/sample_smartart.json @@ -0,0 +1,93 @@ +{ + "book_name": "sample_smartart.xlsx", + "sheets": { + "Sheet1": { + "shapes": [ + { + "id": 1, + "l": 0, + "t": 28, + "kind": "smartart", + "layout": "基本の循環", + "nodes": [ + { "text": "1", "kids": [{ "text": "要件定義" }] }, + { "text": "2", "kids": [{ "text": "報連相" }, { "text": "開発" }] }, + { + "text": "3", + "kids": [{ "text": "実装確認" }, { "text": "動作確認" }] + }, + { "text": "4", "kids": [{ "text": "対策" }] }, + { "text": "5", "kids": [{ "text": "最終確認" }] } + ] + }, + { + "id": 2, + "l": 388, + "t": 32, + "kind": "smartart", + "layout": "開始点強調型プロセス", + "nodes": [ + { "text": "企画" }, + { "text": "執筆" }, + { "text": "編集" }, + { "text": "制作" }, + { "text": "校正" } + ] + }, + { + "id": 3, + "l": 46, + "t": 325, + "kind": "smartart", + "layout": "組織図", + "nodes": [ + { + "text": "取締役会", + "kids": [ + { + "text": "社長", + "kids": [ + { "text": "企画管理部" }, + { + "text": "営業部", + "kids": [ + { "text": "第1営業課" }, + { "text": "第2営業課" }, + { "text": "第3営業課" }, + { "text": "海外営業課" } + ] + }, + { + "text": "開発部", + "kids": [{ "text": "第1開発課" }, { "text": "第2開発課" }] + }, + { + "text": "技術部", + "kids": [{ "text": "第1技術課" }, { "text": "第2技術課" }] + }, + { + "text": "生産部", + "kids": [ + { "text": "愛知工場" }, + { "text": "山形工場" }, + { "text": "高知工場" } + ] + }, + { + "text": "総務部", + "kids": [ + { "text": "総務課" }, + { "text": "人事課" }, + { "text": "経理課" } + ] + } + ] + } + ] + } + ] + } + ] + } + } +} diff --git a/sample/smartart/sample_smartart.xlsx b/sample/smartart/sample_smartart.xlsx new file mode 100644 index 0000000..7812f7c Binary files /dev/null and b/sample/smartart/sample_smartart.xlsx differ diff --git a/sample/smartart/sample_smartart_for_llm.md b/sample/smartart/sample_smartart_for_llm.md new file mode 100644 index 0000000..b453d06 --- /dev/null +++ b/sample/smartart/sample_smartart_for_llm.md @@ -0,0 +1,92 @@ +# 📘 sample_smartart.xlsx + +## 1. 基本の循環(SmartArt) + +- **1** + - 要件定義 +- **2** + - 報連相 + - 開発 +- **3** + - 実装確認 + - 動作確認 +- **4** + - 対策 +- **5** + - 最終確認 + +--- + +## 2. 開始点強調型プロセス(SmartArt) + +1. 企画 +2. 執筆 +3. 編集 +4. 制作 +5. 校正 + +```mermaid +flowchart LR + B1["企画"] --> B2["執筆"] --> B3["編集"] --> B4["制作"] --> B5["校正"] +``` + +--- + +## 3. 組織図(SmartArt) + +- **取締役会** + - **社長** + - 企画管理部 + - 営業部 + - 第 1 営業課 + - 第 2 営業課 + - 第 3 営業課 + - 海外営業課 + - 開発部 + - 第 1 開発課 + - 第 2 開発課 + - 技術部 + - 第 1 技術課 + - 第 2 技術課 + - 生産部 + - 愛知工場 + - 山形工場 + - 高知工場 + - 総務部 + - 総務課 + - 人事課 + - 経理課 + +```mermaid +flowchart TB + T["取締役会"] + P["社長"] + + T --> P + + P --> K1["企画管理部"] + + P --> E["営業部"] + E --> E1["第1営業課"] + E --> E2["第2営業課"] + E --> E3["第3営業課"] + E --> E4["海外営業課"] + + P --> D["開発部"] + D --> D1["第1開発課"] + D --> D2["第2開発課"] + + P --> G["技術部"] + G --> G1["第1技術課"] + G --> G2["第2技術課"] + + P --> S["生産部"] + S --> S1["愛知工場"] + S --> S2["山形工場"] + S --> S3["高知工場"] + + P --> A["総務部"] + A --> A1["総務課"] + A --> A2["人事課"] + A --> A3["経理課"] +``` diff --git a/schemas/arrow.json b/schemas/arrow.json new file mode 100644 index 0000000..d2ef8f1 --- /dev/null +++ b/schemas/arrow.json @@ -0,0 +1,162 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "description": "Connector shape metadata.", + "properties": { + "begin_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the start of a connector.", + "title": "Begin Arrow Style" + }, + "begin_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", + "title": "Begin Id" + }, + "direction": { + "anyOf": [ + { + "enum": [ + "E", + "SE", + "S", + "SW", + "W", + "NW", + "N", + "NE" + ], + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Connector direction (compass heading).", + "title": "Direction" + }, + "end_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the end of a connector.", + "title": "End Arrow Style" + }, + "end_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", + "title": "End Id" + }, + "h": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape height (None if unknown).", + "title": "H" + }, + "id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" + }, + "kind": { + "const": "arrow", + "default": "arrow", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "rotation": { + "anyOf": [ + { + "type": "number" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Rotation angle in degrees.", + "title": "Rotation" + }, + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "w": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t" + ], + "title": "Arrow", + "type": "object" +} \ No newline at end of file diff --git a/schemas/print_area_view.json b/schemas/print_area_view.json index d718773..38b57e2 100644 --- a/schemas/print_area_view.json +++ b/schemas/print_area_view.json @@ -1,5 +1,166 @@ { "$defs": { + "Arrow": { + "description": "Connector shape metadata.", + "properties": { + "begin_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the start of a connector.", + "title": "Begin Arrow Style" + }, + "begin_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", + "title": "Begin Id" + }, + "direction": { + "anyOf": [ + { + "enum": [ + "E", + "SE", + "S", + "SW", + "W", + "NW", + "N", + "NE" + ], + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Connector direction (compass heading).", + "title": "Direction" + }, + "end_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the end of a connector.", + "title": "End Arrow Style" + }, + "end_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", + "title": "End Id" + }, + "h": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape height (None if unknown).", + "title": "H" + }, + "id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" + }, + "kind": { + "const": "arrow", + "default": "arrow", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "rotation": { + "anyOf": [ + { + "type": "number" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Rotation angle in degrees.", + "title": "Rotation" + }, + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "w": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t" + ], + "title": "Arrow", + "type": "object" + }, "CellRow": { "description": "A single row of cells with optional hyperlinks.", "properties": { @@ -246,9 +407,9 @@ "type": "object" }, "Shape": { - "description": "Shape metadata (position, size, text, and styling).", + "description": "Normal shape metadata.", "properties": { - "begin_arrow_style": { + "h": { "anyOf": [ { "type": "integer" @@ -258,10 +419,10 @@ } ], "default": null, - "description": "Arrow style enum for the start of a connector.", - "title": "Begin Arrow Style" + "description": "Shape height (None if unknown).", + "title": "H" }, - "begin_id": { + "id": { "anyOf": [ { "type": "integer" @@ -271,46 +432,58 @@ } ], "default": null, - "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", - "title": "Begin Id" + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" }, - "direction": { + "kind": { + "const": "shape", + "default": "shape", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "rotation": { "anyOf": [ { - "enum": [ - "E", - "SE", - "S", - "SW", - "W", - "NW", - "N", - "NE" - ], - "type": "string" + "type": "number" }, { "type": "null" } ], "default": null, - "description": "Connector direction (compass heading).", - "title": "Direction" + "description": "Rotation angle in degrees.", + "title": "Rotation" }, - "end_arrow_style": { + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "type": { "anyOf": [ { - "type": "integer" + "type": "string" }, { "type": "null" } ], "default": null, - "description": "Arrow style enum for the end of a connector.", - "title": "End Arrow Style" + "description": "Excel shape type name.", + "title": "Type" }, - "end_id": { + "w": { "anyOf": [ { "type": "integer" @@ -320,9 +493,21 @@ } ], "default": null, - "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", - "title": "End Id" - }, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t" + ], + "title": "Shape", + "type": "object" + }, + "SmartArt": { + "description": "SmartArt shape metadata with nested nodes.", + "properties": { "h": { "anyOf": [ { @@ -349,11 +534,31 @@ "description": "Sequential shape id within the sheet (if applicable).", "title": "Id" }, + "kind": { + "const": "smartart", + "default": "smartart", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, "l": { "description": "Left offset (Excel units).", "title": "L", "type": "integer" }, + "layout": { + "description": "SmartArt layout name.", + "title": "Layout", + "type": "string" + }, + "nodes": { + "description": "Root nodes of SmartArt tree.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Nodes", + "type": "array" + }, "rotation": { "anyOf": [ { @@ -377,19 +582,6 @@ "title": "Text", "type": "string" }, - "type": { - "anyOf": [ - { - "type": "string" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Excel shape type name.", - "title": "Type" - }, "w": { "anyOf": [ { @@ -407,9 +599,33 @@ "required": [ "text", "l", - "t" + "t", + "layout" ], - "title": "Shape", + "title": "SmartArt", + "type": "object" + }, + "SmartArtNode": { + "description": "Node of SmartArt hierarchy.", + "properties": { + "kids": { + "description": "Child nodes.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Kids", + "type": "array" + }, + "text": { + "description": "Visible text for the node.", + "title": "Text", + "type": "string" + } + }, + "required": [ + "text" + ], + "title": "SmartArtNode", "type": "object" } }, @@ -444,7 +660,17 @@ "shapes": { "description": "Shapes overlapping the area.", "items": { - "$ref": "#/$defs/Shape" + "anyOf": [ + { + "$ref": "#/$defs/Shape" + }, + { + "$ref": "#/$defs/Arrow" + }, + { + "$ref": "#/$defs/SmartArt" + } + ] }, "title": "Shapes", "type": "array" diff --git a/schemas/shape.json b/schemas/shape.json index dff32d0..8f76162 100644 --- a/schemas/shape.json +++ b/schemas/shape.json @@ -1,82 +1,7 @@ { "$schema": "https://json-schema.org/draft/2020-12/schema", - "description": "Shape metadata (position, size, text, and styling).", + "description": "Normal shape metadata.", "properties": { - "begin_arrow_style": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Arrow style enum for the start of a connector.", - "title": "Begin Arrow Style" - }, - "begin_id": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", - "title": "Begin Id" - }, - "direction": { - "anyOf": [ - { - "enum": [ - "E", - "SE", - "S", - "SW", - "W", - "NW", - "N", - "NE" - ], - "type": "string" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Connector direction (compass heading).", - "title": "Direction" - }, - "end_arrow_style": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Arrow style enum for the end of a connector.", - "title": "End Arrow Style" - }, - "end_id": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", - "title": "End Id" - }, "h": { "anyOf": [ { @@ -103,6 +28,13 @@ "description": "Sequential shape id within the sheet (if applicable).", "title": "Id" }, + "kind": { + "const": "shape", + "default": "shape", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, "l": { "description": "Left offset (Excel units).", "title": "L", diff --git a/schemas/sheet.json b/schemas/sheet.json index fff9dfc..9ea5497 100644 --- a/schemas/sheet.json +++ b/schemas/sheet.json @@ -1,5 +1,166 @@ { "$defs": { + "Arrow": { + "description": "Connector shape metadata.", + "properties": { + "begin_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the start of a connector.", + "title": "Begin Arrow Style" + }, + "begin_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", + "title": "Begin Id" + }, + "direction": { + "anyOf": [ + { + "enum": [ + "E", + "SE", + "S", + "SW", + "W", + "NW", + "N", + "NE" + ], + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Connector direction (compass heading).", + "title": "Direction" + }, + "end_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the end of a connector.", + "title": "End Arrow Style" + }, + "end_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", + "title": "End Id" + }, + "h": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape height (None if unknown).", + "title": "H" + }, + "id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" + }, + "kind": { + "const": "arrow", + "default": "arrow", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "rotation": { + "anyOf": [ + { + "type": "number" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Rotation angle in degrees.", + "title": "Rotation" + }, + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "w": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t" + ], + "title": "Arrow", + "type": "object" + }, "CellRow": { "description": "A single row of cells with optional hyperlinks.", "properties": { @@ -246,9 +407,9 @@ "type": "object" }, "Shape": { - "description": "Shape metadata (position, size, text, and styling).", + "description": "Normal shape metadata.", "properties": { - "begin_arrow_style": { + "h": { "anyOf": [ { "type": "integer" @@ -258,10 +419,10 @@ } ], "default": null, - "description": "Arrow style enum for the start of a connector.", - "title": "Begin Arrow Style" + "description": "Shape height (None if unknown).", + "title": "H" }, - "begin_id": { + "id": { "anyOf": [ { "type": "integer" @@ -271,46 +432,58 @@ } ], "default": null, - "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", - "title": "Begin Id" + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" }, - "direction": { + "kind": { + "const": "shape", + "default": "shape", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "rotation": { "anyOf": [ { - "enum": [ - "E", - "SE", - "S", - "SW", - "W", - "NW", - "N", - "NE" - ], - "type": "string" + "type": "number" }, { "type": "null" } ], "default": null, - "description": "Connector direction (compass heading).", - "title": "Direction" + "description": "Rotation angle in degrees.", + "title": "Rotation" }, - "end_arrow_style": { + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "type": { "anyOf": [ { - "type": "integer" + "type": "string" }, { "type": "null" } ], "default": null, - "description": "Arrow style enum for the end of a connector.", - "title": "End Arrow Style" + "description": "Excel shape type name.", + "title": "Type" }, - "end_id": { + "w": { "anyOf": [ { "type": "integer" @@ -320,9 +493,21 @@ } ], "default": null, - "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", - "title": "End Id" - }, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t" + ], + "title": "Shape", + "type": "object" + }, + "SmartArt": { + "description": "SmartArt shape metadata with nested nodes.", + "properties": { "h": { "anyOf": [ { @@ -349,11 +534,31 @@ "description": "Sequential shape id within the sheet (if applicable).", "title": "Id" }, + "kind": { + "const": "smartart", + "default": "smartart", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, "l": { "description": "Left offset (Excel units).", "title": "L", "type": "integer" }, + "layout": { + "description": "SmartArt layout name.", + "title": "Layout", + "type": "string" + }, + "nodes": { + "description": "Root nodes of SmartArt tree.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Nodes", + "type": "array" + }, "rotation": { "anyOf": [ { @@ -377,19 +582,6 @@ "title": "Text", "type": "string" }, - "type": { - "anyOf": [ - { - "type": "string" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Excel shape type name.", - "title": "Type" - }, "w": { "anyOf": [ { @@ -407,9 +599,33 @@ "required": [ "text", "l", - "t" + "t", + "layout" ], - "title": "Shape", + "title": "SmartArt", + "type": "object" + }, + "SmartArtNode": { + "description": "Node of SmartArt hierarchy.", + "properties": { + "kids": { + "description": "Child nodes.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Kids", + "type": "array" + }, + "text": { + "description": "Visible text for the node.", + "title": "Text", + "type": "string" + } + }, + "required": [ + "text" + ], + "title": "SmartArtNode", "type": "object" } }, @@ -472,7 +688,17 @@ "shapes": { "description": "Shapes detected on the sheet.", "items": { - "$ref": "#/$defs/Shape" + "anyOf": [ + { + "$ref": "#/$defs/Shape" + }, + { + "$ref": "#/$defs/Arrow" + }, + { + "$ref": "#/$defs/SmartArt" + } + ] }, "title": "Shapes", "type": "array" diff --git a/schemas/smartart.json b/schemas/smartart.json new file mode 100644 index 0000000..68d1cab --- /dev/null +++ b/schemas/smartart.json @@ -0,0 +1,126 @@ +{ + "$defs": { + "SmartArtNode": { + "description": "Node of SmartArt hierarchy.", + "properties": { + "kids": { + "description": "Child nodes.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Kids", + "type": "array" + }, + "text": { + "description": "Visible text for the node.", + "title": "Text", + "type": "string" + } + }, + "required": [ + "text" + ], + "title": "SmartArtNode", + "type": "object" + } + }, + "$schema": "https://json-schema.org/draft/2020-12/schema", + "description": "SmartArt shape metadata with nested nodes.", + "properties": { + "h": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape height (None if unknown).", + "title": "H" + }, + "id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" + }, + "kind": { + "const": "smartart", + "default": "smartart", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "layout": { + "description": "SmartArt layout name.", + "title": "Layout", + "type": "string" + }, + "nodes": { + "description": "Root nodes of SmartArt tree.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Nodes", + "type": "array" + }, + "rotation": { + "anyOf": [ + { + "type": "number" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Rotation angle in degrees.", + "title": "Rotation" + }, + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "w": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t", + "layout" + ], + "title": "SmartArt", + "type": "object" +} \ No newline at end of file diff --git a/schemas/smartart_node.json b/schemas/smartart_node.json new file mode 100644 index 0000000..109b7b7 --- /dev/null +++ b/schemas/smartart_node.json @@ -0,0 +1,29 @@ +{ + "$defs": { + "SmartArtNode": { + "description": "Node of SmartArt hierarchy.", + "properties": { + "kids": { + "description": "Child nodes.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Kids", + "type": "array" + }, + "text": { + "description": "Visible text for the node.", + "title": "Text", + "type": "string" + } + }, + "required": [ + "text" + ], + "title": "SmartArtNode", + "type": "object" + } + }, + "$ref": "#/$defs/SmartArtNode", + "$schema": "https://json-schema.org/draft/2020-12/schema" +} \ No newline at end of file diff --git a/schemas/workbook.json b/schemas/workbook.json index 4fac8d1..12ab273 100644 --- a/schemas/workbook.json +++ b/schemas/workbook.json @@ -1,5 +1,166 @@ { "$defs": { + "Arrow": { + "description": "Connector shape metadata.", + "properties": { + "begin_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the start of a connector.", + "title": "Begin Arrow Style" + }, + "begin_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", + "title": "Begin Id" + }, + "direction": { + "anyOf": [ + { + "enum": [ + "E", + "SE", + "S", + "SW", + "W", + "NW", + "N", + "NE" + ], + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Connector direction (compass heading).", + "title": "Direction" + }, + "end_arrow_style": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Arrow style enum for the end of a connector.", + "title": "End Arrow Style" + }, + "end_id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", + "title": "End Id" + }, + "h": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape height (None if unknown).", + "title": "H" + }, + "id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" + }, + "kind": { + "const": "arrow", + "default": "arrow", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "rotation": { + "anyOf": [ + { + "type": "number" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Rotation angle in degrees.", + "title": "Rotation" + }, + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "w": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t" + ], + "title": "Arrow", + "type": "object" + }, "CellRow": { "description": "A single row of cells with optional hyperlinks.", "properties": { @@ -246,83 +407,8 @@ "type": "object" }, "Shape": { - "description": "Shape metadata (position, size, text, and styling).", + "description": "Normal shape metadata.", "properties": { - "begin_arrow_style": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Arrow style enum for the start of a connector.", - "title": "Begin Arrow Style" - }, - "begin_id": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).", - "title": "Begin Id" - }, - "direction": { - "anyOf": [ - { - "enum": [ - "E", - "SE", - "S", - "SW", - "W", - "NW", - "N", - "NE" - ], - "type": "string" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Connector direction (compass heading).", - "title": "Direction" - }, - "end_arrow_style": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Arrow style enum for the end of a connector.", - "title": "End Arrow Style" - }, - "end_id": { - "anyOf": [ - { - "type": "integer" - }, - { - "type": "null" - } - ], - "default": null, - "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).", - "title": "End Id" - }, "h": { "anyOf": [ { @@ -349,6 +435,13 @@ "description": "Sequential shape id within the sheet (if applicable).", "title": "Id" }, + "kind": { + "const": "shape", + "default": "shape", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, "l": { "description": "Left offset (Excel units).", "title": "L", @@ -471,7 +564,17 @@ "shapes": { "description": "Shapes detected on the sheet.", "items": { - "$ref": "#/$defs/Shape" + "anyOf": [ + { + "$ref": "#/$defs/Shape" + }, + { + "$ref": "#/$defs/Arrow" + }, + { + "$ref": "#/$defs/SmartArt" + } + ] }, "title": "Shapes", "type": "array" @@ -487,6 +590,129 @@ }, "title": "SheetData", "type": "object" + }, + "SmartArt": { + "description": "SmartArt shape metadata with nested nodes.", + "properties": { + "h": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape height (None if unknown).", + "title": "H" + }, + "id": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Sequential shape id within the sheet (if applicable).", + "title": "Id" + }, + "kind": { + "const": "smartart", + "default": "smartart", + "description": "Shape kind.", + "title": "Kind", + "type": "string" + }, + "l": { + "description": "Left offset (Excel units).", + "title": "L", + "type": "integer" + }, + "layout": { + "description": "SmartArt layout name.", + "title": "Layout", + "type": "string" + }, + "nodes": { + "description": "Root nodes of SmartArt tree.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Nodes", + "type": "array" + }, + "rotation": { + "anyOf": [ + { + "type": "number" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Rotation angle in degrees.", + "title": "Rotation" + }, + "t": { + "description": "Top offset (Excel units).", + "title": "T", + "type": "integer" + }, + "text": { + "description": "Visible text content of the shape.", + "title": "Text", + "type": "string" + }, + "w": { + "anyOf": [ + { + "type": "integer" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Shape width (None if unknown).", + "title": "W" + } + }, + "required": [ + "text", + "l", + "t", + "layout" + ], + "title": "SmartArt", + "type": "object" + }, + "SmartArtNode": { + "description": "Node of SmartArt hierarchy.", + "properties": { + "kids": { + "description": "Child nodes.", + "items": { + "$ref": "#/$defs/SmartArtNode" + }, + "title": "Kids", + "type": "array" + }, + "text": { + "description": "Visible text for the node.", + "title": "Text", + "type": "string" + } + }, + "required": [ + "text" + ], + "title": "SmartArtNode", + "type": "object" } }, "$schema": "https://json-schema.org/draft/2020-12/schema", diff --git a/scripts/gen_json_schema.py b/scripts/gen_json_schema.py index b230b05..a848939 100644 --- a/scripts/gen_json_schema.py +++ b/scripts/gen_json_schema.py @@ -6,6 +6,7 @@ from pydantic import BaseModel from exstruct.models import ( + Arrow, CellRow, Chart, ChartSeries, @@ -13,6 +14,8 @@ PrintAreaView, Shape, SheetData, + SmartArt, + SmartArtNode, WorkbookData, ) @@ -44,6 +47,9 @@ def main() -> int: "sheet": SheetData, "cell_row": CellRow, "shape": Shape, + "arrow": Arrow, + "smartart": SmartArt, + "smartart_node": SmartArtNode, "chart": Chart, "chart_series": ChartSeries, "print_area": PrintArea, diff --git a/src/exstruct/core/modeling.py b/src/exstruct/core/modeling.py index 475e036..2b312e8 100644 --- a/src/exstruct/core/modeling.py +++ b/src/exstruct/core/modeling.py @@ -2,7 +2,16 @@ from dataclasses import dataclass -from ..models import CellRow, Chart, PrintArea, Shape, SheetData, WorkbookData +from ..models import ( + Arrow, + CellRow, + Chart, + PrintArea, + Shape, + SheetData, + SmartArt, + WorkbookData, +) @dataclass(frozen=True) @@ -20,7 +29,7 @@ class SheetRawData: """ rows: list[CellRow] - shapes: list[Shape] + shapes: list[Shape | Arrow | SmartArt] charts: list[Chart] table_candidates: list[str] print_areas: list[PrintArea] diff --git a/src/exstruct/core/pipeline.py b/src/exstruct/core/pipeline.py index 3258da9..9dfcc04 100644 --- a/src/exstruct/core/pipeline.py +++ b/src/exstruct/core/pipeline.py @@ -10,7 +10,7 @@ import xlwings as xw from ..errors import FallbackReason -from ..models import CellRow, Chart, PrintArea, Shape, WorkbookData +from ..models import Arrow, CellRow, Chart, PrintArea, Shape, SmartArt, WorkbookData from .backends.com_backend import ComBackend from .backends.openpyxl_backend import OpenpyxlBackend from .cells import WorkbookColorsMap, detect_tables @@ -23,7 +23,7 @@ ExtractionMode = Literal["light", "standard", "verbose"] CellData = dict[str, list[CellRow]] PrintAreaData = dict[str, list[PrintArea]] -ShapeData = dict[str, list[Shape]] +ShapeData = dict[str, list[Shape | Arrow | SmartArt]] ChartData = dict[str, list[Chart]] logger = logging.getLogger(__name__) diff --git a/src/exstruct/core/shapes.py b/src/exstruct/core/shapes.py index 02b2257..1831937 100644 --- a/src/exstruct/core/shapes.py +++ b/src/exstruct/core/shapes.py @@ -1,13 +1,13 @@ from __future__ import annotations -from collections.abc import Iterator +from collections.abc import Iterable, Iterator import math -from typing import SupportsInt, cast +from typing import Literal, Protocol, SupportsInt, cast, runtime_checkable import xlwings as xw from xlwings import Book -from ..models import Shape +from ..models import Arrow, Shape, SmartArt, SmartArtNode from ..models.maps import MSO_AUTO_SHAPE_TYPE_MAP, MSO_SHAPE_TYPE_MAP @@ -16,11 +16,13 @@ def compute_line_angle_deg(w: float, h: float) -> float: return math.degrees(math.atan2(h, w)) % 360.0 -def angle_to_compass(angle: float) -> str: +def angle_to_compass( + angle: float, +) -> Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"]: """Convert angle to 8-point compass direction (0deg=E, 45deg=NE, 90deg=N, etc).""" dirs = ["E", "NE", "N", "NW", "W", "SW", "S", "SE"] idx = int(((angle + 22.5) % 360) // 45) - return dirs[idx] + return cast(Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"], dirs[idx]) def coord_to_cell_by_edges( @@ -108,16 +110,129 @@ def _should_include_shape( return True +@runtime_checkable +class _TextRangeLike(Protocol): + """Text range interface for SmartArt nodes.""" + + Text: str | None + + +@runtime_checkable +class _TextFrameLike(Protocol): + """Text frame interface for SmartArt nodes.""" + + HasText: bool + TextRange: _TextRangeLike + + +@runtime_checkable +class _SmartArtNodeLike(Protocol): + """SmartArt node interface.""" + + Level: int + TextFrame2: _TextFrameLike + + +@runtime_checkable +class _SmartArtLike(Protocol): + """SmartArt interface.""" + + Layout: object + AllNodes: Iterable[_SmartArtNodeLike] + + +def _shape_has_smartart(shp: xw.Shape) -> bool: + """Return True if the shape exposes SmartArt content.""" + try: + api = shp.api + except Exception: + return False + try: + return bool(api.HasSmartArt) + except Exception: + return False + + +def _get_smartart_layout_name(smartart: _SmartArtLike | None) -> str: + """Return SmartArt layout name or a fallback label.""" + if smartart is None: + return "Unknown" + try: + layout = getattr(smartart, "Layout", None) + name = getattr(layout, "Name", None) + return str(name) if name is not None else "Unknown" + except Exception: + return "Unknown" + + +def _collect_smartart_node_info( + smartart: _SmartArtLike | None, +) -> list[tuple[int, str]]: + """Collect (level, text) pairs from SmartArt nodes.""" + nodes_info: list[tuple[int, str]] = [] + if smartart is None: + return nodes_info + try: + all_nodes = smartart.AllNodes + except Exception: + return nodes_info + + for node in all_nodes: + level = _get_smartart_node_level(node) + if level is None: + continue + text = "" + try: + text_frame = node.TextFrame2 + if text_frame.HasText: + text_value = text_frame.TextRange.Text + text = str(text_value) if text_value is not None else "" + except Exception: + text = "" + nodes_info.append((level, text)) + return nodes_info + + +def _get_smartart_node_level(node: _SmartArtNodeLike) -> int | None: + """Return SmartArt node level or None when unavailable.""" + try: + return int(node.Level) + except Exception: + return None + + +def _build_smartart_tree(nodes_info: list[tuple[int, str]]) -> list[SmartArtNode]: + """Build nested SmartArtNode roots from flat (level, text) tuples.""" + roots: list[SmartArtNode] = [] + stack: list[tuple[int, SmartArtNode]] = [] + for level, text in nodes_info: + node = SmartArtNode(text=text, kids=[]) + while stack and stack[-1][0] >= level: + stack.pop() + if stack: + stack[-1][1].kids.append(node) + else: + roots.append(node) + stack.append((level, node)) + return roots + + +def _extract_smartart_nodes(smartart: _SmartArtLike | None) -> list[SmartArtNode]: + """Extract SmartArt nodes as nested roots.""" + nodes_info = _collect_smartart_node_info(smartart) + return _build_smartart_tree(nodes_info) + + def get_shapes_with_position( # noqa: C901 workbook: Book, mode: str = "standard" -) -> dict[str, list[Shape]]: - """Scan shapes in a workbook and return per-sheet Shape lists with position info.""" - shape_data: dict[str, list[Shape]] = {} +) -> dict[str, list[Shape | Arrow | SmartArt]]: + """Scan shapes in a workbook and return per-sheet shape lists with position info.""" + shape_data: dict[str, list[Shape | Arrow | SmartArt]] = {} for sheet in workbook.sheets: - shapes: list[Shape] = [] + shapes: list[Shape | Arrow | SmartArt] = [] excel_names: list[tuple[str, int]] = [] node_index = 0 - pending_connections: list[tuple[Shape, str | None, str | None]] = [] + pending_connections: list[tuple[Arrow, str | None, str | None]] = [] for root in sheet.shapes: for shp in iter_shapes_recursive(root): try: @@ -148,7 +263,11 @@ def get_shapes_with_position( # noqa: C901 except Exception: text = "" - if not _should_include_shape( + if mode == "light": + continue + + has_smartart = _shape_has_smartart(shp) + if not has_smartart and not _should_include_shape( text=text, shape_type_num=type_num, shape_type_str=shape_type_str, @@ -179,7 +298,8 @@ def get_shapes_with_position( # noqa: C901 ): is_relationship_geom = True if shape_type_str and ( - "Connector" in shape_type_str or shape_type_str in ("Line", "ConnectLine") + "Connector" in shape_type_str + or shape_type_str in ("Line", "ConnectLine") ): is_relationship_geom = True if shape_name and ("Connector" in shape_name or "Line" in shape_name): @@ -192,19 +312,54 @@ def get_shapes_with_position( # noqa: C901 excel_name = shape_name if isinstance(shape_name, str) else None - shape_obj = Shape( - id=shape_id, - text=text, - l=int(shp.left), - t=int(shp.top), - w=int(shp.width) - if mode == "verbose" or shape_type_str == "Group" - else None, - h=int(shp.height) - if mode == "verbose" or shape_type_str == "Group" - else None, - type=type_label, - ) + shape_obj: Shape | Arrow | SmartArt + if has_smartart: + smartart_obj: _SmartArtLike | None = None + try: + smartart_obj = shp.api.SmartArt + except Exception: + smartart_obj = None + shape_obj = SmartArt( + id=shape_id, + text=text, + l=int(shp.left), + t=int(shp.top), + w=int(shp.width) + if mode == "verbose" or shape_type_str == "Group" + else None, + h=int(shp.height) + if mode == "verbose" or shape_type_str == "Group" + else None, + layout=_get_smartart_layout_name(smartart_obj), + nodes=_extract_smartart_nodes(smartart_obj), + ) + elif is_relationship_geom: + shape_obj = Arrow( + id=shape_id, + text=text, + l=int(shp.left), + t=int(shp.top), + w=int(shp.width) + if mode == "verbose" or shape_type_str == "Group" + else None, + h=int(shp.height) + if mode == "verbose" or shape_type_str == "Group" + else None, + ) + else: + shape_obj = Shape( + id=shape_id, + text=text, + l=int(shp.left), + t=int(shp.top), + w=int(shp.width) + if mode == "verbose" or shape_type_str == "Group" + else None, + h=int(shp.height) + if mode == "verbose" or shape_type_str == "Group" + else None, + type=type_label, + ) if excel_name: if shape_id is not None: excel_names.append((excel_name, shape_id)) @@ -215,7 +370,8 @@ def get_shapes_with_position( # noqa: C901 angle = compute_line_angle_deg( float(shp.width), float(shp.height) ) - shape_obj.direction = angle_to_compass(angle) # type: ignore + if isinstance(shape_obj, Arrow): + shape_obj.direction = angle_to_compass(angle) try: rot = float(shp.api.Rotation) if abs(rot) > 1e-6: @@ -225,8 +381,9 @@ def get_shapes_with_position( # noqa: C901 try: begin_style = int(shp.api.Line.BeginArrowheadStyle) end_style = int(shp.api.Line.EndArrowheadStyle) - shape_obj.begin_arrow_style = begin_style - shape_obj.end_arrow_style = end_style + if isinstance(shape_obj, Arrow): + shape_obj.begin_arrow_style = begin_style + shape_obj.end_arrow_style = end_style except Exception: pass # Connector begin/end connected shapes (if this shape is a connector). @@ -262,7 +419,8 @@ def get_shapes_with_position( # noqa: C901 pass except Exception: pass - pending_connections.append((shape_obj, begin_name, end_name)) + if isinstance(shape_obj, Arrow): + pending_connections.append((shape_obj, begin_name, end_name)) shapes.append(shape_obj) if pending_connections: name_to_id = {name: sid for name, sid in excel_names} diff --git a/src/exstruct/io/__init__.py b/src/exstruct/io/__init__.py index c8c1201..e2ad37a 100644 --- a/src/exstruct/io/__init__.py +++ b/src/exstruct/io/__init__.py @@ -7,7 +7,16 @@ from ..core.ranges import RangeBounds, parse_range_zero_based from ..errors import OutputError, SerializationError -from ..models import CellRow, Chart, PrintArea, PrintAreaView, Shape, WorkbookData +from ..models import ( + Arrow, + CellRow, + Chart, + PrintArea, + PrintAreaView, + Shape, + SmartArt, + WorkbookData, +) from ..models.types import JsonStructure from .serialize import ( _FORMAT_HINTS, @@ -34,7 +43,14 @@ def dict_without_empty_values(obj: object) -> JsonStructure: ] if isinstance( obj, - WorkbookData | CellRow | Chart | PrintArea | PrintAreaView | Shape, + WorkbookData + | CellRow + | Chart + | PrintArea + | PrintAreaView + | Shape + | Arrow + | SmartArt, ): return dict_without_empty_values(obj.model_dump(exclude_none=True)) return cast(JsonStructure, obj) @@ -161,9 +177,11 @@ def _rects_overlap(a: tuple[int, int, int, int], b: tuple[int, int, int, int]) - return not (a[2] <= b[0] or a[0] >= b[2] or a[3] <= b[1] or a[1] >= b[3]) -def _filter_shapes_to_area(shapes: list[Shape], area: PrintArea) -> list[Shape]: +def _filter_shapes_to_area( + shapes: list[Shape | Arrow | SmartArt], area: PrintArea +) -> list[Shape | Arrow | SmartArt]: area_rect = _area_to_px_rect(area) - filtered: list[Shape] = [] + filtered: list[Shape | Arrow | SmartArt] = [] for shp in shapes: if shp.w is None or shp.h is None: # Fallback: treat shape as a point if size is unknown (standard mode). diff --git a/src/exstruct/models/__init__.py b/src/exstruct/models/__init__.py index 65cfb30..bd40d0b 100644 --- a/src/exstruct/models/__init__.py +++ b/src/exstruct/models/__init__.py @@ -8,8 +8,8 @@ from pydantic import BaseModel, Field -class Shape(BaseModel): - """Shape metadata (position, size, text, and styling).""" +class BaseShape(BaseModel): + """Common shape metadata (position, size, text, and styling).""" id: int | None = Field( default=None, @@ -20,10 +20,22 @@ class Shape(BaseModel): t: int = Field(description="Top offset (Excel units).") w: int | None = Field(default=None, description="Shape width (None if unknown).") h: int | None = Field(default=None, description="Shape height (None if unknown).") - type: str | None = Field(default=None, description="Excel shape type name.") rotation: float | None = Field( default=None, description="Rotation angle in degrees." ) + + +class Shape(BaseShape): + """Normal shape metadata.""" + + kind: Literal["shape"] = Field(default="shape", description="Shape kind.") + type: str | None = Field(default=None, description="Excel shape type name.") + + +class Arrow(BaseShape): + """Connector shape metadata.""" + + kind: Literal["arrow"] = Field(default="arrow", description="Shape kind.") begin_arrow_style: int | None = Field( default=None, description="Arrow style enum for the start of a connector." ) @@ -47,6 +59,23 @@ class Shape(BaseModel): ) +class SmartArtNode(BaseModel): + """Node of SmartArt hierarchy.""" + + text: str = Field(description="Visible text for the node.") + kids: list[SmartArtNode] = Field(default_factory=list, description="Child nodes.") + + +class SmartArt(BaseShape): + """SmartArt shape metadata with nested nodes.""" + + kind: Literal["smartart"] = Field(default="smartart", description="Shape kind.") + layout: str = Field(description="SmartArt layout name.") + nodes: list[SmartArtNode] = Field( + default_factory=list, description="Root nodes of SmartArt tree." + ) + + class CellRow(BaseModel): """A single row of cells with optional hyperlinks.""" @@ -109,7 +138,7 @@ class SheetData(BaseModel): rows: list[CellRow] = Field( default_factory=list, description="Extracted rows with cell values and links." ) - shapes: list[Shape] = Field( + shapes: list[Shape | Arrow | SmartArt] = Field( default_factory=list, description="Shapes detected on the sheet." ) charts: list[Chart] = Field( @@ -267,7 +296,7 @@ class PrintAreaView(BaseModel): book_name: str = Field(description="Workbook name owning the area.") sheet_name: str = Field(description="Sheet name owning the area.") area: PrintArea = Field(description="Print area bounds.") - shapes: list[Shape] = Field( + shapes: list[Shape | Arrow | SmartArt] = Field( default_factory=list, description="Shapes overlapping the area." ) charts: list[Chart] = Field( diff --git a/tests/com/test_shapes_extraction.py b/tests/com/test_shapes_extraction.py index 47347e2..be31e2d 100644 --- a/tests/com/test_shapes_extraction.py +++ b/tests/com/test_shapes_extraction.py @@ -4,6 +4,7 @@ import xlwings as xw from exstruct.core.integrate import extract_workbook +from exstruct.models import Arrow, Shape pytestmark = pytest.mark.com @@ -70,28 +71,22 @@ def test_図形の種別とテキストが抽出される(tmp_path: Path) -> Non wb_data = extract_workbook(path) shapes = wb_data.sheets["Sheet1"].shapes - rect = next(s for s in shapes if s.text == "rect") + rect = next(s for s in shapes if isinstance(s, Shape) and s.text == "rect") assert "AutoShape" in (rect.type or "") assert rect.l >= 0 and rect.t >= 0 - assert rect.id > 0 + assert rect.id is not None and rect.id > 0 - inner = next(s for s in shapes if s.text == "inner") + inner = next(s for s in shapes if isinstance(s, Shape) and s.text == "inner") assert "Group" not in (inner.type or "") # flattened child - assert not any((s.type or "") == "Group" for s in shapes) - assert inner.id > 0 + assert not any(isinstance(s, Shape) and (s.type or "") == "Group" for s in shapes) + assert inner.id is not None and inner.id > 0 ids = [s.id for s in shapes if s.id is not None] assert len(ids) == len(set(ids)) # Standard mode should not emit non-relationship AutoShapes without text. assert not any( - (s.text == "" or s.text is None) + isinstance(s, Shape) + and (s.text == "" or s.text is None) and (s.type or "").startswith("AutoShape") - and not ( - s.direction - or s.begin_arrow_style is not None - or s.end_arrow_style is not None - or s.begin_id is not None - or s.end_id is not None - ) for s in shapes ) @@ -107,7 +102,8 @@ def test_線図形の方向と矢印情報が抽出される(tmp_path: Path) -> line = next( s for s in shapes - if s.begin_arrow_style is not None or s.end_arrow_style is not None + if isinstance(s, Arrow) + and (s.begin_arrow_style is not None or s.end_arrow_style is not None) ) assert line.direction == "E" @@ -121,20 +117,20 @@ def test_コネクターの接続元と接続先が抽出される(tmp_path: Pat shapes = wb_data.sheets["Sheet1"].shapes connectors = [ - s - for s in shapes - if s.begin_id is not None or s.end_id is not None + s for s in shapes if isinstance(s, Arrow) and (s.begin_id or s.end_id) ] # If the environment could not wire connectors, simply skip the assertion. if not connectors: - pytest.skip("Excel failed to populate ConnectorFormat.ConnectedShape properties.") + pytest.skip( + "Excel failed to populate ConnectorFormat.ConnectedShape properties." + ) conn = connectors[0] assert conn.begin_id is not None assert conn.end_id is not None assert conn.begin_id != conn.end_id # Connected shape ids should correspond to some emitted shapes' id. - shape_ids = {s.id for s in shapes} + shape_ids = {s.id for s in shapes if s.id is not None} assert conn.begin_id in shape_ids assert conn.end_id in shape_ids diff --git a/tests/core/test_mode_output.py b/tests/core/test_mode_output.py index 4d900f8..06e5930 100644 --- a/tests/core/test_mode_output.py +++ b/tests/core/test_mode_output.py @@ -10,6 +10,7 @@ import xlwings as xw from exstruct import extract, process_excel +from exstruct.models import Arrow def _make_basic_book(path: Path) -> None: @@ -78,8 +79,8 @@ def test_standardモードはテキストなし図形を除外する(tmp_path: P for s in shapes: if s.text != "": continue - assert s.type is not None - assert ("Line" in s.type) or ("Connector" in s.type) or ("Arrow" in s.type) + assert isinstance(s, Arrow) + assert s.direction is not None or s.begin_arrow_style is not None def test_verboseモードでは全図形と幅高さが出力される(tmp_path: Path) -> None: @@ -108,11 +109,11 @@ def test_invalidモードはエラーになる(tmp_path: Path) -> None: path = tmp_path / "book.xlsx" _make_basic_book(path) with pytest.raises(ValueError): - extract(path, mode="invalid") + extract(path, mode="invalid") # type: ignore[arg-type] out = tmp_path / "out.json" with pytest.raises(ValueError): - process_excel(path, out, mode="invalid") + process_excel(path, out, mode="invalid") # type: ignore[arg-type] def test_CLIのmode引数バリデーション(tmp_path: Path) -> None: diff --git a/tests/core/test_shapes_positions_dummy.py b/tests/core/test_shapes_positions_dummy.py index 13e228f..999e70b 100644 --- a/tests/core/test_shapes_positions_dummy.py +++ b/tests/core/test_shapes_positions_dummy.py @@ -1,6 +1,7 @@ from dataclasses import dataclass from exstruct.core.shapes import get_shapes_with_position +from exstruct.models import Arrow @dataclass(frozen=True) @@ -45,6 +46,27 @@ def Rotation(self) -> float: return self.rotation +@dataclass(frozen=True) +class _DummyApiSmartArt: + shape_type: int + + @property + def Type(self) -> int: + return self.shape_type + + @property + def AutoShapeType(self) -> int: + raise RuntimeError("AutoShapeType unavailable") + + @property + def HasSmartArt(self) -> bool: + return True + + @property + def SmartArt(self) -> object: + return object() + + @dataclass(frozen=True) class _DummyShape: name: str @@ -53,7 +75,7 @@ class _DummyShape: top: float width: float height: float - api: _DummyApi + api: object @dataclass(frozen=True) @@ -107,7 +129,8 @@ def test_get_shapes_with_position_standard_filters_textless_non_relation() -> No assert len(shapes) == 2 assert {s.text for s in shapes} == {"Hello", ""} line_entries = [s for s in shapes if s.text == ""] - assert line_entries[0].type == "Line" + assert isinstance(line_entries[0], Arrow) + assert line_entries[0].direction == "E" text_entries = [s for s in shapes if s.text == "Hello"] assert text_entries[0].id == 1 @@ -151,3 +174,19 @@ def test_get_shapes_with_position_verbose_includes_all_and_sizes() -> None: assert len(shapes) == 3 assert all(s.w is not None and s.h is not None for s in shapes) + + +def test_get_shapes_with_position_light_skips_smartart() -> None: + smartart_shape = _DummyShape( + name="SmartArt1", + text="sa", + left=10.0, + top=20.0, + width=100.0, + height=50.0, + api=_DummyApiSmartArt(shape_type=24), + ) + book = _DummyBook(sheets=[_DummySheet(name="Sheet1", shapes=[smartart_shape])]) + + result = get_shapes_with_position(book, mode="light") + assert result["Sheet1"] == [] diff --git a/tests/core/test_shapes_smartart_utils.py b/tests/core/test_shapes_smartart_utils.py new file mode 100644 index 0000000..e8c49b0 --- /dev/null +++ b/tests/core/test_shapes_smartart_utils.py @@ -0,0 +1,147 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import cast + +import xlwings as xw + +from exstruct.core import shapes as shapes_mod + + +@dataclass +class _DummyTextRange: + Text: str | None # noqa: N815 + + +@dataclass +class _DummyTextFrame: + HasText: bool # noqa: N815 + TextRange: _DummyTextRange # noqa: N815 + + +@dataclass +class _DummyNode: + Level: int # noqa: N815 + TextFrame2: _DummyTextFrame # noqa: N815 + + +@dataclass +class _DummyLayout: + Name: str | None # noqa: N815 + + +@dataclass +class _DummySmartArt: + AllNodes: list[_DummyNode] # noqa: N815 + Layout: object # noqa: N815 + + +@dataclass(frozen=True) +class _DummyApi: + HasSmartArt: bool # noqa: N815 + SmartArt: _DummySmartArt | None # noqa: N815 + + +@dataclass(frozen=True) +class _DummyApiRaises: + @property + def HasSmartArt(self) -> bool: # noqa: N802 + raise RuntimeError("HasSmartArt unavailable") + + +@dataclass(frozen=True) +class _DummyShape: + api_obj: object + + @property + def api(self) -> object: + return self.api_obj + + +@dataclass(frozen=True) +class _DummyShapeRaisesApi: + @property + def api(self) -> object: + raise RuntimeError("api unavailable") + + +def test_shape_has_smartart_true_false() -> None: + smartart = _DummySmartArt(AllNodes=[], Layout=_DummyLayout(Name="L")) + has = shapes_mod._shape_has_smartart( + cast( + xw.Shape, + _DummyShape(api_obj=_DummyApi(HasSmartArt=True, SmartArt=smartart)), + ) + ) + assert has is True + + has_false = shapes_mod._shape_has_smartart( + cast(xw.Shape, _DummyShape(api_obj=_DummyApi(HasSmartArt=False, SmartArt=None))) + ) + assert has_false is False + + +def test_shape_has_smartart_handles_exceptions() -> None: + has = shapes_mod._shape_has_smartart( + cast(xw.Shape, _DummyShape(api_obj=_DummyApiRaises())) + ) + assert has is False + + has_api_error = shapes_mod._shape_has_smartart( + cast(xw.Shape, _DummyShapeRaisesApi()) + ) + assert has_api_error is False + + +def test_get_smartart_layout_name() -> None: + assert shapes_mod._get_smartart_layout_name(None) == "Unknown" + smartart = _DummySmartArt(AllNodes=[], Layout=_DummyLayout(Name="Layout")) + assert ( + shapes_mod._get_smartart_layout_name(cast(shapes_mod._SmartArtLike, smartart)) + == "Layout" + ) + smartart_no_name = _DummySmartArt(AllNodes=[], Layout=_DummyLayout(Name=None)) + assert ( + shapes_mod._get_smartart_layout_name( + cast(shapes_mod._SmartArtLike, smartart_no_name) + ) + == "Unknown" + ) + + +def test_collect_smartart_node_info_and_tree() -> None: + nodes = [ + _DummyNode( + Level=1, + TextFrame2=_DummyTextFrame( + HasText=True, TextRange=_DummyTextRange(Text="root") + ), + ), + _DummyNode( + Level=2, + TextFrame2=_DummyTextFrame( + HasText=True, TextRange=_DummyTextRange(Text="child") + ), + ), + _DummyNode( + Level=1, + TextFrame2=_DummyTextFrame( + HasText=False, TextRange=_DummyTextRange(Text=None) + ), + ), + ] + smartart = _DummySmartArt(AllNodes=nodes, Layout=_DummyLayout(Name="L")) + info = shapes_mod._collect_smartart_node_info( + cast(shapes_mod._SmartArtLike, smartart) + ) + assert info == [(1, "root"), (2, "child"), (1, "")] + + roots = shapes_mod._extract_smartart_nodes(cast(shapes_mod._SmartArtLike, smartart)) + assert len(roots) == 2 + assert roots[0].text == "root" + assert roots[0].kids[0].text == "child" + assert roots[1].text == "" + + +def test_collect_smartart_node_info_none() -> None: + assert shapes_mod._collect_smartart_node_info(None) == [] diff --git a/tests/io/test_print_area_views.py b/tests/io/test_print_area_views.py index a7c1a1e..8e61e90 100644 --- a/tests/io/test_print_area_views.py +++ b/tests/io/test_print_area_views.py @@ -2,12 +2,33 @@ from pathlib import Path from exstruct.io import save_print_area_views -from exstruct.models import CellRow, Chart, PrintArea, Shape, SheetData, WorkbookData +from exstruct.models import ( + Arrow, + CellRow, + Chart, + PrintArea, + Shape, + SheetData, + SmartArt, + SmartArtNode, + WorkbookData, +) def _workbook_with_print_area() -> WorkbookData: shape_inside = Shape(id=1, text="inside", l=10, t=5, w=20, h=10, type="Rect") shape_outside = Shape(id=2, text="outside", l=200, t=200, w=30, h=30, type="Rect") + smartart_inside = SmartArt( + id=3, + text="sa", + l=15, + t=8, + w=20, + h=10, + layout="Layout", + nodes=[SmartArtNode(text="root", kids=[])], + ) + arrow_inside = Arrow(id=None, text="", l=5, t=5, w=20, h=2) chart_inside = Chart( name="c1", chart_type="Line", @@ -40,7 +61,7 @@ def _workbook_with_print_area() -> WorkbookData: CellRow(r=2, c={"1": "B"}), CellRow(r=3, c={"1": "C"}), ], - shapes=[shape_inside, shape_outside], + shapes=[shape_inside, smartart_inside, arrow_inside, shape_outside], charts=[chart_inside, chart_outside], table_candidates=["A1:B2", "C1:C1"], print_areas=[PrintArea(r1=1, c1=0, r2=2, c2=1)], @@ -61,7 +82,8 @@ def test_save_print_area_views_filters_rows_and_tables(tmp_path: Path) -> None: # Only table candidates fully contained in the print area remain. assert data["table_candidates"] == ["A1:B2"] # Shapes/Charts filtered by overlap; outside or size-less charts are dropped. - assert len(data["shapes"]) == 1 and data["shapes"][0]["text"] == "inside" + kinds = {shape["kind"] for shape in data["shapes"]} + assert kinds == {"shape", "smartart", "arrow"} assert len(data["charts"]) == 1 and data["charts"][0]["name"] == "c1" diff --git a/tests/models/test_models_export.py b/tests/models/test_models_export.py index 38080dd..9d110e5 100644 --- a/tests/models/test_models_export.py +++ b/tests/models/test_models_export.py @@ -1,10 +1,11 @@ from importlib import util +import json from pathlib import Path import pytest from exstruct.errors import MissingDependencyError -from exstruct.models import CellRow, SheetData, WorkbookData +from exstruct.models import CellRow, SheetData, SmartArt, SmartArtNode, WorkbookData HAS_PYYAML = util.find_spec("yaml") is not None HAS_TOON = util.find_spec("toon") is not None @@ -95,3 +96,31 @@ def test_workbook_iter_and_getitem() -> None: assert pairs[0][1] is first with pytest.raises(KeyError): _ = wb["Nope"] + + +def test_sheet_json_includes_smartart_nodes() -> None: + smartart = SmartArt( + id=1, + text="sa", + l=0, + t=0, + w=10, + h=10, + layout="Layout", + nodes=[ + SmartArtNode( + text="root", + kids=[SmartArtNode(text="child", kids=[])], + ) + ], + ) + sheet = SheetData( + rows=[], + shapes=[smartart], + charts=[], + table_candidates=[], + ) + data = json.loads(sheet.to_json()) + assert data["shapes"][0]["kind"] == "smartart" + assert data["shapes"][0]["nodes"][0]["text"] == "root" + assert data["shapes"][0]["nodes"][0]["kids"][0]["text"] == "child" diff --git a/tests/models/test_models_validation.py b/tests/models/test_models_validation.py index 1b57bf7..3bb45dd 100644 --- a/tests/models/test_models_validation.py +++ b/tests/models/test_models_validation.py @@ -2,11 +2,14 @@ import pytest from exstruct.models import ( + Arrow, CellRow, Chart, ChartSeries, Shape, SheetData, + SmartArt, + SmartArtNode, WorkbookData, ) @@ -14,7 +17,25 @@ def test_モデルのデフォルトとオプション値() -> None: shape = Shape(id=1, text="t", l=1, t=2, w=None, h=None) assert shape.rotation is None - assert shape.direction is None + assert shape.kind == "shape" + + arrow = Arrow(id=None, text="a", l=1, t=1, w=10, h=1) + assert arrow.begin_arrow_style is None + assert arrow.end_arrow_style is None + assert arrow.kind == "arrow" + + smartart = SmartArt( + id=3, + text="sa", + l=5, + t=6, + w=50, + h=40, + layout="Layout", + nodes=[SmartArtNode(text="root", kids=[])], + ) + assert smartart.layout == "Layout" + assert smartart.nodes[0].text == "root" cell = CellRow(r=1, c={"0": "v"}) assert cell.c["0"] == "v" @@ -48,7 +69,7 @@ def test_モデルのデフォルトとオプション値() -> None: def test_directionのリテラル検証() -> None: with pytest.raises(ValidationError): - Shape(id=1, text="bad", l=0, t=0, w=None, h=None, direction="X") + Arrow(id=1, text="bad", l=0, t=0, w=None, h=None, direction="X") def test_cellrowの数値正規化() -> None: @@ -56,3 +77,21 @@ def test_cellrowの数値正規化() -> None: assert isinstance(cell.c["0"], int) assert isinstance(cell.c["1"], float) assert cell.c["2"] == "text" + + +def test_arrow_only_fields_are_not_on_shape() -> None: + arrow = Arrow( + id=None, + text="a", + l=1, + t=1, + w=10, + h=2, + begin_id=1, + end_id=2, + ) + shape = Shape(id=1, text="s", l=0, t=0, w=None, h=None) + assert arrow.begin_id == 1 + assert arrow.end_id == 2 + assert not hasattr(shape, "begin_id") + assert not hasattr(shape, "end_id")