diff --git a/README.ja.md b/README.ja.md
index 74958de..6ed7e01 100644
--- a/README.ja.md
+++ b/README.ja.md
@@ -1,15 +1,15 @@
# ExStruct — Excel 構造化抽出エンジン
-[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
+[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [](https://codecov.io/gh/harumiWeb/exstruct)

-ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。
+ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・SmartArt・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。
## 主な特徴
-- **Excel → 構造化 JSON**: セル、図形、チャート、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。
-- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。
+- **Excel → 構造化 JSON**: セル、図形、チャート、SmartArt、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。
+- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート、SmartArt)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。
- **フォーマット**: JSON(デフォルトはコンパクト、`--pretty` で整形)、YAML、TOON(任意依存)。
- **テーブル検出のチューニング**: API でヒューリスティックを動的に変更可能。
- **ハイパーリンク抽出**: `verbose` モード(または `include_cell_links=True` 指定)でセルのリンクを `links` に出力。
@@ -396,6 +396,11 @@ ExStruct の内部実装を拡張する場合は、
→ [docs/contributors/architecture.md](docs/contributors/architecture.md)
+## カバレッジに関する注意
+
+セル構造推論ロジック(cells.py)は、ヒューリスティックルールと
+Excel 固有の動作に依存しています。網羅的なテストは現実世界の信頼性を反映できないため、完全なカバレッジは意図的に追求されていません。
+
## License
BSD-3-Clause. See `LICENSE` for details.
diff --git a/README.md b/README.md
index 1284127..11a456b 100644
--- a/README.md
+++ b/README.md
@@ -1,17 +1,17 @@
# ExStruct — Excel Structured Extraction Engine
-[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
+[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [](https://codecov.io/gh/harumiWeb/exstruct)

-ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines.
+ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, smartart, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines.
[日本版 README](README.ja.md)
## Features
-- **Excel → Structured JSON**: cells, shapes, charts, table candidates, print areas/views, and auto page-break areas per sheet.
-- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled.
+- **Excel → Structured JSON**: cells, shapes, charts, smartart, table candidates, print areas/views, and auto page-break areas per sheet.
+- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, smartart, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled.
- **Auto page-break export (COM only)**: capture Excel-computed auto page breaks and write per-area JSON/YAML/TOON when requested (CLI option appears only when COM is available).
- **Formats**: JSON (compact by default, `--pretty` available), YAML, TOON (optional dependencies).
- **Table detection tuning**: adjust heuristics at runtime via API.
@@ -395,6 +395,12 @@ please read the contributor architecture guide.
→ [docs/contributors/architecture.md](docs/contributors/architecture.md)
+## Note on coverage
+
+The cell-structure inference logic (cells.py) relies on heuristic rules
+and Excel-specific behaviors. Full coverage is intentionally not pursued,
+as exhaustive testing would not reflect real-world reliability.
+
## License
BSD-3-Clause. See `LICENSE` for details.
diff --git a/docs/README.en.md b/docs/README.en.md
index 1c07081..5ae275f 100644
--- a/docs/README.en.md
+++ b/docs/README.en.md
@@ -1,15 +1,17 @@
# ExStruct — Excel Structured Extraction Engine
-[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
+[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [](https://codecov.io/gh/harumiWeb/exstruct)

-ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines.
+ExStruct reads Excel workbooks and outputs structured data (cells, table candidates, shapes, charts, smartart, print areas/views, auto page-break areas, hyperlinks) as JSON by default, with optional YAML/TOON formats. It targets both COM/Excel environments (rich extraction) and non-COM environments (cells + table candidates + print areas), with tunable detection heuristics and multiple output modes to fit LLM/RAG pipelines.
+
+[日本版 README](README.ja.md)
## Features
-- **Excel → Structured JSON**: cells, shapes, charts, table candidates, print areas/views, and auto page-break areas per sheet.
-- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled.
+- **Excel → Structured JSON**: cells, shapes, charts, smartart, table candidates, print areas/views, and auto page-break areas per sheet.
+- **Output modes**: `light` (cells + table candidates + print areas; no COM, shapes/charts empty), `standard` (texted shapes + arrows, charts, smartart, print areas), `verbose` (all shapes with width/height, charts with size, print areas). Verbose also emits cell hyperlinks and `colors_map`. Size output is flag-controlled.
- **Auto page-break export (COM only)**: capture Excel-computed auto page breaks and write per-area JSON/YAML/TOON when requested (CLI option appears only when COM is available).
- **Formats**: JSON (compact by default, `--pretty` available), YAML, TOON (optional dependencies).
- **Table detection tuning**: adjust heuristics at runtime via API.
@@ -398,6 +400,12 @@ please read the contributor architecture guide.
→ [docs/contributors/architecture.md](docs/contributors/architecture.md)
+## Note on coverage
+
+The cell-structure inference logic (cells.py) relies on heuristic rules
+and Excel-specific behaviors. Full coverage is intentionally not pursued,
+as exhaustive testing would not reflect real-world reliability.
+
## License
BSD-3-Clause. See `LICENSE` for details.
diff --git a/docs/README.ja.md b/docs/README.ja.md
index 8b52db3..d3e2676 100644
--- a/docs/README.ja.md
+++ b/docs/README.ja.md
@@ -1,17 +1,15 @@
# ExStruct — Excel 構造化抽出エンジン
-[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
+[](https://pypi.org/project/exstruct/) [](https://pepy.tech/projects/exstruct)  [](https://github.com/harumiWeb/exstruct/actions/workflows/pytest.yml) [](https://app.codacy.com/gh/harumiWeb/exstruct/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [](https://codecov.io/gh/harumiWeb/exstruct)

-ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。
-
-[English README](README.en.md)
+ExStruct は Excel ワークブックを読み取り、構造化データ(セル・テーブル候補・図形・チャート・SmartArt・印刷範囲ビュー)をデフォルトで JSON に出力します。必要に応じて YAML/TOON も選択でき、COM/Excel 環境ではリッチ抽出、非 COM 環境ではセル+テーブル候補+印刷範囲へのフォールバックで安全に動作します。LLM/RAG 向けに検出ヒューリスティックや出力モードを調整可能です。
## 主な特徴
-- **Excel → 構造化 JSON**: セル、図形、チャート、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。
-- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。
+- **Excel → 構造化 JSON**: セル、図形、チャート、SmartArt、テーブル候補、印刷範囲/自動改ページ範囲(PrintArea/PrintAreaView)をシート単位・範囲単位で出力。
+- **出力モード**: `light`(セル+テーブル候補のみ)、`standard`(テキスト付き図形+矢印、チャート、SmartArt)、`verbose`(全図形を幅高さ付きで出力、セルのハイパーリンクも出力)。
- **フォーマット**: JSON(デフォルトはコンパクト、`--pretty` で整形)、YAML、TOON(任意依存)。
- **テーブル検出のチューニング**: API でヒューリスティックを動的に変更可能。
- **ハイパーリンク抽出**: `verbose` モード(または `include_cell_links=True` 指定)でセルのリンクを `links` に出力。
@@ -341,7 +339,7 @@ flowchart TD
ということが明確に示されています。
-その他の本ライブラリを使ったLLM推論サンプルは以下のディレクトリにあります。
+その他の本ライブラリを使った LLM 推論サンプルは以下のディレクトリにあります。
- [Basic Excel](sample/basic/)
- [Flowchart](sample/flowchart/)
@@ -371,6 +369,7 @@ ExStruct は主に **ライブラリ** として利用される想定で、サ
- 企業利用ではフォークや内部改修が前提です
次のようなチームに適しています。
+
- ブラックボックス化されたツールではなく、透明性が必要
- 必要に応じて内部フォークを保守できる
@@ -402,6 +401,11 @@ ExStruct の内部実装を拡張する場合は、
→ [docs/contributors/architecture.md](docs/contributors/architecture.md)
+## カバレッジに関する注意
+
+セル構造推論ロジック(cells.py)は、ヒューリスティックルールと
+Excel 固有の動作に依存しています。網羅的なテストは現実世界の信頼性を反映できないため、完全なカバレッジは意図的に追求されていません。
+
## License
BSD-3-Clause. See `LICENSE` for details.
diff --git a/docs/agents/CODE_REVIEW.md b/docs/agents/CODE_REVIEW.md
index 5053262..e69de29 100644
--- a/docs/agents/CODE_REVIEW.md
+++ b/docs/agents/CODE_REVIEW.md
@@ -1,779 +0,0 @@
-````md
-**Actionable comments posted: 0**
-
-> [!CAUTION]
-> Some comments are outside the diff and can’t be posted inline due to platform limitations.
->
->
-> ⚠️ Outside diff range comments (1)
->
->
-> src/exstruct/io/__init__.py (1)
->
-> `74-82`: **Return `RangeBounds` directly instead of converting to tuple.**
->
-> This wrapper function unpacks the Pydantic `RangeBounds` model into a tuple, which violates the coding guideline: "Do not return dictionaries or tuples; always use Pydantic BaseModel for structured data." Callers should access bounds via the model's fields (`bounds.r1`, `bounds.c1`, etc.) to preserve type safety and semantic clarity.
->
-> As per coding guidelines, structured data should be returned as Pydantic models, not tuples.
->
->
-> 🔎 Proposed refactor to eliminate tuple conversion
->
-> **Option 1: Remove the wrapper entirely and use `parse_range_zero_based` directly**
->
-> Update callers (e.g., line 129) to use the model fields:
->
-> ```diff
-> def _filter_table_candidates_to_area(
-> table_candidates: list[str], area: PrintArea
-> ) -> list[str]:
-> filtered: list[str] = []
-> for candidate in table_candidates:
-> - bounds = _parse_range_zero_based(candidate)
-> - if not bounds:
-> + bounds = parse_range_zero_based(candidate)
-> + if bounds is None:
-> continue
-> - r1, c1, r2, c2 = bounds
-> + r1, c1, r2, c2 = bounds.r1, bounds.c1, bounds.r2, bounds.c2
-> r1 += 1
-> r2 += 1
-> if r1 >= area.r1 and r2 <= area.r2 and c1 >= area.c1 and c2 <= area.c2:
-> filtered.append(candidate)
-> return filtered
-> ```
->
-> **Option 2: If the wrapper is needed, change return type to `RangeBounds | None`**
->
-> ```diff
-> -def _parse_range_zero_based(range_str: str) -> tuple[int, int, int, int] | None:
-> - """
-> - Parse an Excel range string into zero-based (r1, c1, r2, c2) bounds.
-> - Returns None on failure.
-> - """
-> +def _parse_range_zero_based(range_str: str) -> RangeBounds | None:
-> + """Parse an Excel range string into zero-based bounds.
-> +
-> + Args:
-> + range_str: Excel range string (e.g., "Sheet1!A1:B2").
-> +
-> + Returns:
-> + RangeBounds in zero-based coordinates, or None on failure.
-> + """
-> bounds = parse_range_zero_based(range_str)
-> - if bounds is None:
-> - return None
-> - return (bounds.r1, bounds.c1, bounds.r2, bounds.c2)
-> + return bounds
-> ```
->
->
->
->
->
->
-
-
-♻️ Duplicate comments (3)
-
-
-src/exstruct/core/pipeline.py (2)
-
-`644-645`: **Previous review concern addressed.**
-
-The error message has been updated to use a generic "COM pipeline failed" message with `FallbackReason.COM_PIPELINE_FAILED`, addressing the earlier review comment about misleading error messages.
-
----
-
-`670-670`: **Previous review concern addressed.**
-
-The `reason` parameter is now properly used for logging, addressing the earlier review comment about the unused parameter.
-
-
-
-docs/agents/TEST_REQUIREMENTS.md (1)
-
-`203-204`: **Previous review concern addressed.**
-
-The duplicate requirement IDs have been resolved by renaming them to `[PIPE-MOD-01]` and `[PIPE-MOD-02]`, making each requirement ID unique throughout the document.
-
-
-
-
-
-
-🧹 Nitpick comments (7)
-
-
-src/exstruct/io/__init__.py (1)
-
-`488-489`: **Consider renaming publicly exported functions to remove underscore prefix.**
-
-The functions `_require_yaml` and `_require_toon` are now part of the public API (exported in `__all__`), but their names start with an underscore, which by Python convention indicates private/internal functions. For a cleaner public API, consider renaming them to `require_yaml` and `require_toon` (without the underscore).
-
-
-🔎 Suggested changes for consistent public API naming
-
-In `src/exstruct/io/serialize.py`, rename the functions:
-
-```diff
--def _require_yaml() -> ModuleType:
-+def require_yaml() -> ModuleType:
- """Ensure pyyaml is installed; otherwise raise with guidance."""
- ...
-
--def _require_toon() -> ModuleType:
-+def require_toon() -> ModuleType:
- """Ensure python-toon is installed; otherwise raise with guidance."""
- ...
-```
-
-In this file, update the import and export:
-
-```diff
- from .serialize import (
- _FORMAT_HINTS,
- _ensure_format_hint,
-- _require_toon,
-- _require_yaml,
-+ require_toon,
-+ require_yaml,
- _serialize_payload_from_hint,
- )
-
- __all__ = [
- "dict_without_empty_values",
- "save_as_json",
- "save_as_yaml",
- "save_as_toon",
- "save_sheets",
- "save_sheets_as_json",
- "build_print_area_views",
- "save_print_area_views",
- "save_auto_page_break_views",
- "serialize_workbook",
-- "_require_yaml",
-- "_require_toon",
-+ "require_yaml",
-+ "require_toon",
- ]
-```
-
-
-
-
-
-tests/test_backends.py (3)
-
-`16-22`: **Add type hints to mock functions.**
-
-The fake functions should have explicit type hints for maintainability and mypy compliance.
-
-
-🔎 Proposed refactor
-
-```diff
-- def fake_cells(_: Path) -> dict[str, list[object]]:
-+ def fake_cells(file_path: Path) -> dict[str, list[object]]:
- calls.append("cells")
- return {}
-
-- def fake_cells_links(_: Path) -> dict[str, list[object]]:
-+ def fake_cells_links(file_path: Path) -> dict[str, list[object]]:
- calls.append("links")
- return {}
-```
-
-
-
-As per coding guidelines, avoid using `_` for actual parameters; use descriptive names with proper type hints.
-
----
-
-`43-44`: **Use explicit parameter names with type hints.**
-
-Replace generic `_` and `__` with descriptive parameter names for better readability.
-
-
-🔎 Proposed refactor
-
-```diff
-- def fake_detect(_: Path, __: str) -> list[str]:
-+ def fake_detect(file_path: Path, sheet_name: str) -> list[str]:
- raise RuntimeError("boom")
-```
-
-
-
-As per coding guidelines, use descriptive parameter names.
-
----
-
-`58-59`: **Use explicit type hints instead of generic object.**
-
-The mock function should use proper type signatures for clarity.
-
-
-🔎 Proposed refactor
-
-```diff
-- def fake_colors_map(*_: object, **__: object) -> object:
-+ def fake_colors_map(
-+ workbook: object,
-+ *,
-+ include_default_background: bool,
-+ ignore_colors: set[str] | None
-+ ) -> object:
- raise RuntimeError("boom")
-```
-
-
-
-As per coding guidelines, provide explicit type hints for all parameters.
-
-
-
-src/exstruct/core/backends/openpyxl_backend.py (1)
-
-`103-125`: **Consider adding a module-level docstring.**
-
-The file contains well-structured code with proper docstrings for classes and methods, but lacks a module-level docstring explaining the openpyxl backend's role in the extraction pipeline.
-
-
-🔎 Suggested addition
-
-Add at the top of the file after imports:
-
-```python
-"""Openpyxl-based backend for Excel workbook extraction.
-
-This module provides the OpenpyxlBackend class which uses the openpyxl library
-to extract cells, print areas, color maps, and table candidates from Excel files.
-It serves as the primary extraction backend with fallback support when COM is unavailable.
-"""
-```
-
-
-
-As per coding guidelines, adding comprehensive module documentation improves maintainability.
-
-
-
-src/exstruct/core/backends/com_backend.py (2)
-
-`173-206`: **Consider extracting CSV parsing to a utility function for reusability.**
-
-The `_split_csv_respecting_quotes` function implements custom CSV parsing with quote handling. While the implementation is correct, this type of utility might be useful elsewhere in the codebase.
-
-Consider moving this to a shared utilities module if similar parsing is needed elsewhere, or documenting that Python's `csv` module with appropriate dialect settings could be an alternative.
-
----
-
-`1-206`: **Add module-level docstring for better documentation.**
-
-The file contains well-structured code with proper class and method docstrings, but lacks a module-level docstring explaining the COM backend's role.
-
-
-🔎 Suggested addition
-
-Add at the top after imports:
-
-```python
-"""COM-based backend for Excel workbook extraction via xlwings.
-
-This module provides the ComBackend class which uses xlwings/COM to extract
-print areas, color maps, and auto page breaks from Excel files. It complements
-the OpenpyxlBackend and is used when COM automation is available and required
-features (like auto page breaks) need native Excel API access.
-"""
-```
-
-
-
-As per coding guidelines, comprehensive module documentation improves maintainability.
-
-
-
-
-
-
-📜 Review details
-
-**Configuration used**: defaults
-
-**Review profile**: CHILL
-
-**Plan**: Pro
-
-
-📥 Commits
-
-Reviewing files that changed from the base of the PR and between 79d4bbb797caaa2641a3581e2d56059fc5a8cbf2 and 93b10e2d8b0e9e4e339f038cb215951dd4fdd7c2.
-
-
-
-
-📒 Files selected for processing (20)
-
-- `.github/workflows/pytest.yml`
-- `docs/agents/CODE_REVIEW.md`
-- `docs/agents/DATA_MODEL.md`
-- `docs/agents/EXCEL_EXTRACTION.md`
-- `docs/agents/TEST_REQUIREMENTS.md`
-- `src/exstruct/core/backends/com_backend.py`
-- `src/exstruct/core/backends/openpyxl_backend.py`
-- `src/exstruct/core/pipeline.py`
-- `src/exstruct/errors.py`
-- `src/exstruct/io/__init__.py`
-- `src/exstruct/models/__init__.py`
-- `tests/test_auto_page_breaks.py`
-- `tests/test_backends.py`
-- `tests/test_engine.py`
-- `tests/test_integrate_raw_data.py`
-- `tests/test_modeling.py`
-- `tests/test_models_validation.py`
-- `tests/test_pipeline.py`
-- `tests/test_print_area_views.py`
-- `tests/test_print_areas_openpyxl.py`
-
-
-
-
-🚧 Files skipped from review as they are similar to previous changes (4)
-
-- tests/test_pipeline.py
-- tests/test_modeling.py
-- tests/test_integrate_raw_data.py
-- src/exstruct/errors.py
-
-
-
-
-🧰 Additional context used
-
-
-📓 Path-based instructions (1)
-
-
-**/*.py
-
-**📄 CodeRabbit inference engine (AGENTS.md)**
-
-> `**/*.py`: Add type hints to all function and method arguments and return values (mypy strict compliance)
-> Use `Any` type only at external library boundaries (xlwings, pandas, numpy, etc.)
-> Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
-> Ensure each function has a single responsibility (cyclomatic complexity should not exceed 12)
-> Organize imports in this order: (1) standard library, (2) third-party packages, (3) exstruct internal modules
-> Use Google-style docstrings for all functions and classes
-> Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external data to Pydantic models at boundaries
-> Avoid writing God Functions (large, single-responsibility-violating functions)
-> Avoid writing God Objects (classes with too many responsibilities)
-> Avoid excessive nesting and deep conditional branching
-> Do not return dictionaries or tuples; always use Pydantic BaseModel for structured data
-> Ensure code passes mypy strict mode with zero errors
-> Ensure code passes Ruff checks (E, W, F, I, B, UP, N, C90) with zero errors
-> Avoid circular dependencies between modules
-> AI should propose automatic function splitting when functions are too complex
-> AI should propose additional Pydantic models when data structures are insufficient
-> AI should automatically fix and organize imports when they are inconsistent
-> AI should improve docstrings when they are incomplete or insufficient
-
-Files:
-
-- `tests/test_print_areas_openpyxl.py`
-- `tests/test_backends.py`
-- `src/exstruct/core/backends/openpyxl_backend.py`
-- `src/exstruct/io/__init__.py`
-- `src/exstruct/core/backends/com_backend.py`
-- `tests/test_print_area_views.py`
-- `tests/test_engine.py`
-- `tests/test_auto_page_breaks.py`
-- `src/exstruct/core/pipeline.py`
-- `tests/test_models_validation.py`
-- `src/exstruct/models/__init__.py`
-
-
-
-
-🧠 Learnings (1)
-
-
-📚 Learning: 2025-12-21T10:55:22.238Z
-
-```
-Learnt from: CR
-Repo: harumiWeb/exstruct PR: 0
-File: AGENTS.md:0-0
-Timestamp: 2025-12-21T10:55:22.238Z
-Learning: Applies to **/*.py : Do not depend on internal structures of external libraries (xlwings, pandas, numpy); normalize external data to Pydantic models at boundaries
-```
-
-**Applied to files:**
-
-- `src/exstruct/io/__init__.py`
-
-
-
-
-🧬 Code graph analysis (6)
-
-
-tests/test_backends.py (3)
-
-
-src/exstruct/core/backends/com_backend.py (3)
-
-- `ComBackend` (18-135)
-- `extract_colors_map` (51-74)
-- `extract_print_areas` (27-49)
-
-
-
-src/exstruct/core/backends/openpyxl_backend.py (5)
-
-- `OpenpyxlBackend` (23-100)
-- `extract_cells` (32-45)
-- `detect_tables` (88-100)
-- `extract_colors_map` (64-86)
-- `extract_print_areas` (47-62)
-
-
-
-src/exstruct/core/ranges.py (1)
-
-- `parse_range_zero_based` (24-47)
-
-
-
-
-
-src/exstruct/core/backends/openpyxl_backend.py (3)
-
-
-src/exstruct/models/__init__.py (1)
-
-- `PrintArea` (97-103)
-
-
-
-src/exstruct/core/ranges.py (1)
-
-- `parse_range_zero_based` (24-47)
-
-
-
-src/exstruct/core/workbook.py (1)
-
-- `openpyxl_workbook` (14-53)
-
-
-
-
-
-src/exstruct/io/__init__.py (3)
-
-
-src/exstruct/core/ranges.py (1)
-
-- `parse_range_zero_based` (24-47)
-
-
-
-src/exstruct/errors.py (2)
-
-- `OutputError` (32-33)
-- `SerializationError` (20-21)
-
-
-
-src/exstruct/io/serialize.py (4)
-
-- `_ensure_format_hint` (28-49)
-- `_require_toon` (104-112)
-- `_require_yaml` (93-101)
-- `_serialize_payload_from_hint` (52-90)
-
-
-
-
-
-tests/test_print_area_views.py (1)
-
-
-src/exstruct/models/__init__.py (2)
-
-- `CellRow` (50-59)
-- `PrintArea` (97-103)
-
-
-
-
-
-tests/test_auto_page_breaks.py (1)
-
-
-src/exstruct/models/__init__.py (1)
-
-- `PrintArea` (97-103)
-
-
-
-
-
-tests/test_models_validation.py (1)
-
-
-src/exstruct/models/__init__.py (1)
-
-- `CellRow` (50-59)
-
-
-
-
-
-
-🪛 GitHub Check: Codacy Static Code Analysis
-
-
-src/exstruct/core/backends/com_backend.py
-
-[warning] 37-37: src/exstruct/core/backends/com_backend.py#L37
-Try, Except, Continue detected.
-
----
-
-[warning] 132-132: src/exstruct/core/backends/com_backend.py#L132
-Try, Except, Pass detected.
-
-
-
-
-
-
-
-
-⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
-
-- GitHub Check: Codacy Static Code Analysis
-- GitHub Check: test (windows-latest, 3.11)
-
-
-
-
-🔇 Additional comments (24)
-
-
-docs/agents/DATA_MODEL.md (2)
-
-`96-96`: **Clear coordinate semantics for PrintArea rows—good clarification.**
-
-Adding "(1-based, inclusive)" to `r1` and `r2` aligns with the existing `c1` and `c2` documentation and makes the row coordinate interpretation explicit. This improves clarity for API consumers.
-
-Also applies to: 98-98
-
----
-
-`140-140`: **Documentation for `colors_map` coordinate system is correct and matches implementation.**
-
-The codebase confirms that `colors_map` uses (row=1-based, col=0-based) coordinates as documented. Both extraction paths—openpyxl (`cell.row, cell.col_idx - 1`) and COM (`row, col - 1`)—apply the same logic: rows are preserved as 1-based, columns are converted from 1-based to 0-based. Test coverage validates this with `{"#FFFFFF": [(1, 0)]}`, confirming row 1 with column 0.
-
-
-
-src/exstruct/io/__init__.py (3)
-
-`8-18`: **Well done removing the OpenPyXL dependency!**
-
-The refactor successfully eliminates direct dependency on `openpyxl.utils.cell.range_boundaries` by using the internal `parse_range_zero_based` function, and centralizes serialization logic through the new helpers from `.serialize`. This aligns with the coding guideline to avoid depending on internal structures of external libraries.
-
-Based on learnings, this change properly normalizes external data to Pydantic models at boundaries.
-
----
-
-`283-288`: **Excellent serialization refactoring!**
-
-The centralization of format validation and serialization through `_ensure_format_hint` and `_serialize_payload_from_hint` eliminates code duplication and provides a single source of truth for format handling. This improves maintainability and consistency across all export functions.
-
-Also applies to: 315-318, 341-346, 374-377, 393-402, 429-432, 449-454, 469-471
-
----
-
-`133-134`: **Row index adjustments correctly implement 1-based row coordinates.**
-
-The adjustments on lines 133-134 (adding 1 to `r1` and `r2`) and lines 148, 150 (subtracting 1 from `area.r1` in pixel calculations) correctly convert between the 0-based coordinates returned by `parse_range_zero_based` and the 1-based row coordinates used in `PrintArea`, as documented in the PR objectives.
-
-Also applies to: 148-148, 150-150
-
-
-
-tests/test_models_validation.py (1)
-
-`19-19`: **LGTM: Coordinate system alignment.**
-
-The update from `r=0` to `r=1` correctly reflects the 1-based row indexing documented in the `CellRow` model definition.
-
-
-
-tests/test_auto_page_breaks.py (1)
-
-`47-47`: **LGTM: Coordinate system alignment.**
-
-The updated PrintArea coordinates correctly reflect the 1-based row indexing (r1=1, r2=2) and 0-based column indexing (c1=0, c2=1) documented in the PrintArea model.
-
-
-
-.github/workflows/pytest.yml (1)
-
-`34-46`: **LGTM: Coverage collection and Codecov integration properly configured.**
-
-The pytest coverage flags and Codecov upload step are correctly implemented. The Codecov action is now properly pinned to a full commit SHA, addressing the previous security concern.
-
-
-
-src/exstruct/models/__init__.py (3)
-
-`14-16`: **LGTM: Field definition reflow.**
-
-The Shape.id field definition has been reflowed for readability while maintaining the same semantic meaning.
-
----
-
-`101-103`: **LGTM: Coordinate system documentation clarified.**
-
-The field descriptions now explicitly document that columns use 0-based indexing, improving clarity for API consumers.
-
----
-
-`129-132`: **LGTM: colors_map coordinate semantics documented.**
-
-The description now explicitly clarifies that rows are 1-based and columns are 0-based, which is essential for correct usage of the colors_map field.
-
-
-
-docs/agents/EXCEL_EXTRACTION.md (1)
-
-`1-66`: **LGTM: Comprehensive pipeline documentation.**
-
-This documentation provides clear guidance on the extraction pipeline architecture, coordinate system semantics, modes, and fallback behavior. The coordinate system clarification (rows 1-based, columns 0-based) is consistent with the code changes throughout the PR.
-
-
-
-src/exstruct/core/pipeline.py (1)
-
-`1-696`: **LGTM: Pipeline architecture with proper type safety.**
-
-The pipeline module demonstrates excellent adherence to coding guidelines:
-
-- Complete type hints on all functions and parameters
-- Google-style docstrings throughout
-- Immutable dataclasses for pipeline configuration and state
-- Well-organized imports (stdlib → third-party → internal)
-- Clear separation of pre-COM and COM extraction steps
-
-The architecture provides a solid foundation for the extraction workflow with explicit fallback handling and state tracking.
-
-
-
-docs/agents/TEST_REQUIREMENTS.md (1)
-
-`1-228`: **LGTM: Comprehensive test requirements specification.**
-
-The test requirements document provides thorough coverage of functional, non-functional, and integration requirements. The organization by category (pipeline, backend, ranges, etc.) aligns well with the modular architecture introduced in this PR.
-
-
-
-tests/test_print_areas_openpyxl.py (1)
-
-`26-26`: **LGTM: Coordinate system alignment.**
-
-The assertion now correctly expects 1-based row coordinates (r1=1, r2=2) and 0-based column coordinates (c1=0, c2=1), consistent with the PrintArea model definition and the broader coordinate system updates in this PR.
-
-
-
-tests/test_print_area_views.py (1)
-
-`39-46`: **LGTM! Coordinate system update correctly applied.**
-
-The test data has been properly updated to reflect the documented coordinate convention (1-based rows, 0-based columns). The CellRow indices and PrintArea bounds are now consistent with the model definitions.
-
-
-
-tests/test_backends.py (2)
-
-`76-93`: **LGTM! Print area extraction test validates coordinate system.**
-
-The test correctly verifies that print areas extracted via openpyxl use 1-based row indexing (r1=1) and 0-based column indexing (c1=0), consistent with the PrintArea model definition.
-
----
-
-`95-101`: **LGTM! Range parsing test validates zero-based intermediate representation.**
-
-The test correctly verifies that `parse_range_zero_based` returns zero-based coordinates (r1=0, c1=0, r2=1, c2=1) for the range "Sheet1!A1:B2", which are then converted to the appropriate coordinate system by backend methods.
-
-
-
-src/exstruct/core/backends/openpyxl_backend.py (1)
-
-`159-162`: **LGTM! Coordinate conversion correctly implemented.**
-
-The conversion from zero-based coordinates (from `_parse_print_area_range`) to the PrintArea model is now correct:
-
-- Rows (r1, r2): converted from 0-based to 1-based with `+ 1`
-- Columns (c1, c2): kept as 0-based (no adjustment)
-
-This matches the PrintArea model definition where rows are 1-based and columns are 0-based.
-
-
-
-tests/test_engine.py (2)
-
-`6-12`: **LGTM! Improved API design with nested option groups.**
-
-The introduction of `FilterOptions` and `DestinationOptions` improves the API's organization and clarity, grouping related settings together under `OutputOptions`.
-
----
-
-`65-65`: **LGTM! PrintArea coordinate updated correctly.**
-
-The PrintArea coordinates have been updated to use 1-based row indexing (r1=1) consistent with the model definition and broader coordinate system clarification in this PR.
-
-
-
-docs/agents/CODE_REVIEW.md (1)
-
-`1-609`: **Documentation file - no code review needed.**
-
-This file contains example code review comments and AI agent prompts for documentation purposes. It does not contain production code requiring review.
-
-
-
-src/exstruct/core/backends/com_backend.py (2)
-
-`45-48`: **LGTM! Coordinate conversion correctly implemented.**
-
-The conversion from zero-based coordinates (from `_parse_print_area_range`) to PrintArea is correct:
-
-- Rows (r1, r2): `+ 1` converts 0-based to 1-based
-- Columns (c1, c2): no adjustment keeps 0-based
-
-This matches the PrintArea model definition.
-
----
-
-`119-125`: **LGTM! Coordinate conversion for COM values is correct.**
-
-The coordinate handling here is actually correct, contrary to the past review comment:
-
-- `rows[i]` and `cols[j]` are 1-based values from Excel COM
-- Rows (r1, r2): used as-is since PrintArea expects 1-based rows ✓
-- Columns (c1, c2): `- 1` converts from 1-based COM to 0-based for PrintArea ✓
-
-This is consistent with the PrintArea model where rows are 1-based and columns are 0-based.
-
-
-
-
-
-
-
-
-````
diff --git a/docs/agents/DATA_MODEL.md b/docs/agents/DATA_MODEL.md
index 67a9c49..3ed8c7f 100644
--- a/docs/agents/DATA_MODEL.md
+++ b/docs/agents/DATA_MODEL.md
@@ -1,6 +1,6 @@
# ExStruct データモデル仕様
-**Version**: 0.10
+**Version**: 0.13
**Status**: Authoritative — 本ドキュメントは ExStruct が返す全モデルの唯一の正準ソースです。
core / io / integrate は必ずこの仕様に従うこと。モデルは **pydantic v2** で実装します。
@@ -13,32 +13,53 @@ ExStruct は Excel ワークブックを LLM が扱いやすい **意味構造
---
-# 2. Shape Model
+# 2. Shape / Arrow / SmartArt Model
+
+出力の `shapes` は下記 3 モデルのユニオンです。`kind` で判別します。
```jsonc
-Shape {
- id: int | null // sheet 内での通番 id(線・矢印は null の場合あり)
+BaseShape {
+ id: int | null // sheet 内の通番 id(線/矢印は null の場合あり)
text: str
l: int // left (px)
t: int // top (px)
w: int | null // width (px)
h: int | null // height(px)
- type: str | null // MSO 図形タイプのラベル
rotation: float | null
+}
+
+Shape extends BaseShape {
+ kind: "shape"
+ type: str | null // MSO 図形タイプラベル
+}
+
+Arrow extends BaseShape {
+ kind: "arrow"
begin_arrow_style: int | null
end_arrow_style: int | null
begin_id: int | null // コネクタ始点の接続先 Shape.id
end_id: int | null // コネクタ終点の接続先 Shape.id
direction: "E"|"SE"|"S"|"SW"|"W"|"NW"|"N"|"NE" | null
}
+
+SmartArtNode {
+ text: str
+ kids: [SmartArtNode]
+}
+
+SmartArt extends BaseShape {
+ kind: "smartart"
+ layout: str
+ nodes: [SmartArtNode]
+}
```
補足:
- `direction` は線や矢印の向きを 8 方位に正規化したもの。
- 矢印スタイルは Excel の enum に対応。
-- `begin_id` / `end_id` は、コネクタが接続している図形の `id`(Excel の `ConnectorFormat.BeginConnectedShape` / `EndConnectedShape` に対応)。
-- 線や矢印の Shape では `id` が null になる場合があります。
+- `begin_id` / `end_id` は、コネクタが接続している図形の `id` に対応(`ConnectorFormat.BeginConnectedShape` / `EndConnectedShape`)。
+- `SmartArtNode` はネスト構造で表現し、`nodes` がツリーの根。
---
@@ -114,7 +135,7 @@ PrintAreaView {
book_name: str
sheet_name: str
area: PrintArea
- shapes: [Shape]
+ shapes: [Shape | Arrow | SmartArt]
charts: [Chart]
rows: [CellRow] // 範囲に交差する行のみ、空列は落とす
table_candidates: [str] // 範囲内に収まるテーブル候補
@@ -132,7 +153,7 @@ PrintAreaView {
```jsonc
SheetData {
rows: [CellRow]
- shapes: [Shape]
+ shapes: [Shape | Arrow | SmartArt]
charts: [Chart]
table_candidates: [str]
print_areas: [PrintArea]
@@ -204,3 +225,4 @@ WorkbookData {
- 0.10: Shape に `id` を追加し、コネクタの接続元/接続先を `id` 参照に変更し、`name` をペイロードから除去。
- 0.11: コネクタのフィールド名を `begin_id` / `end_id` にリネーム。
- 0.12: SheetData に背景色情報を格納する`colors_map`を追加。
+- 0.13: Shape を `Shape` / `Arrow` / `SmartArt` に分離し、`SmartArtNode` のネスト構造を追加。
diff --git a/docs/agents/EXCEL_EXTRACTION.md b/docs/agents/EXCEL_EXTRACTION.md
index fa0d361..92ed0a3 100644
--- a/docs/agents/EXCEL_EXTRACTION.md
+++ b/docs/agents/EXCEL_EXTRACTION.md
@@ -32,14 +32,15 @@
- openpyxl のテーブル定義 + 罫線クラスターを統合
- COM が使えない場合でも table_candidates を維持
-## Shapes
+## Shapes / Arrows / SmartArt
抽出内容:
-- Type / AutoShapeType の正規化
+- Type / AutoShapeType の正規化(`type` は Shape のみ)
- Left/Top/Width/Height
- TextFrame2.TextRange.Text
- 矢印方向や接続情報
+- SmartArt の layout/nodes/kids(ネスト構造)
## Charts
diff --git a/docs/agents/FEATURE_SPEC.md b/docs/agents/FEATURE_SPEC.md
index 12955e5..f512aa7 100644
--- a/docs/agents/FEATURE_SPEC.md
+++ b/docs/agents/FEATURE_SPEC.md
@@ -11,12 +11,6 @@
- SmartArtは基本はShapeのフィールドを持ちつつ、Nodeの情報を再帰的に持つようにする
- rootノードとそれ以外のノードでクラスを分ける
-## リファクタリング案
-
-- リソース取得の冗長性
- - 事象: 印刷範囲取得が openpyxl→COM のようにロジックがファイル内に分散。似たパターンが他にもある。
- - 対策案: 抽出パイプラインをステップ化し、各ステップ(cells, tables, shapes, charts, print_areas)の実装をモジュール単位で揃える。パイプライン定義を 1 か所にまとめるとモード追加や切替が容易になる。
-
## 今後のオプション(検討メモ)
- 表検出スコアリングの閾値を CLI/環境変数で調整可能にする。
diff --git a/docs/agents/OVERVIEW.md b/docs/agents/OVERVIEW.md
index 4e42d80..698dbd1 100644
--- a/docs/agents/OVERVIEW.md
+++ b/docs/agents/OVERVIEW.md
@@ -16,7 +16,7 @@ openpyxl と Excel COM(xlwings)を組み合わせ、LLM が扱いやすい
- Cells(値/リンク/座標)
- Tables(候補範囲)
-- Shapes(位置/種類/テキスト/矢印)
+- Shapes / Arrows / SmartArt(位置/テキスト/矢印/レイアウト)
- Charts(Series/Axis/Type/Title)
- Print Areas / Auto Page Breaks
- Colors Map(条件付き書式を含む)
@@ -24,7 +24,7 @@ openpyxl と Excel COM(xlwings)を組み合わせ、LLM が扱いやすい
## 利用例(概要)
- `extract(path, mode="standard")` で WorkbookData を取得
-- `process_excel` でファイル出力やディレクトリ分割
+- `process_excel` でファイル出力やディレクトリ出力
- CLI で `exstruct file.xlsx --format json` を利用
## ディレクトリ構成(概要)
diff --git a/docs/agents/ROADMAP.md b/docs/agents/ROADMAP.md
index 47b77a0..f13ad1d 100644
--- a/docs/agents/ROADMAP.md
+++ b/docs/agents/ROADMAP.md
@@ -33,11 +33,11 @@
## v0.3.1
-- ShapesとArrowsの分離(後のSmartArt追加のため)
+- Shapes と Arrows の分離(後の SmartArt 追加のため)
+- SmartArt 解析
## v0.4.0
-- SmartArt 解析
- Excel Form Controls 解析
## v1.0.0
diff --git a/docs/agents/TASKS.md b/docs/agents/TASKS.md
index 66cc050..767d666 100644
--- a/docs/agents/TASKS.md
+++ b/docs/agents/TASKS.md
@@ -1,8 +1,34 @@
# Task List
-未完了タスクは [ ]、完了タスクは [x]
+## 1. 既存実装の修正(モデル分離の影響対応)
-- [x] src/exstruct/render/__init__.py の主要分岐と例外経路を洗い出す(_require_excel_app/_require_pdfium/export_pdf/export_sheet_images/_sanitize_sheet_filename)
-- [x] xlwings・pypdfium2 をモックして export_pdf/export_sheet_images の成功/失敗ケースを単体テスト化する
-- [x] 依存不足・予期例外のエラーメッセージ/例外型を検証するテストを追加する
-- [x] シート名のサニタイズ規則と出力ファイル名生成のテストを追加する
+- [x] `src/exstruct/io/__init__.py` の `_filter_shapes_to_area` が `list[Shape | Arrow | SmartArt]` を受け取れるように型と処理を調整する
+- [x] `src/exstruct/core/shapes.py` のコネクタ判定を `Arrow` 前提に変更する(`begin_arrow_style` / `end_arrow_style` などは `Arrow` のみ参照)
+- [x] `src/exstruct/core/shapes.py` の接続 ID 参照を `Arrow` に限定し、`Shape` からの誤参照を除去する
+- [x] `PrintAreaView` 側の `shapes` フィルタで `SmartArt` を落とさないことを確認する
+
+## 2. SmartArt 取得機能の実装方針
+
+- [x] `shape.HasSmartArt` を条件に SmartArt を抽出する
+- [x] `SmartArt.Layout.Name` を `SmartArt.layout` に格納する
+- [x] `SmartArt.AllNodes` を走査し、`level` と `text` を収集する
+- [x] ノード配列から `SmartArtNode` のツリー(`nodes`)を構築する(`level` を使ったスタック組み立て)
+- [x] `SmartArt` は `BaseShape` 相当の位置/サイズ/回転/テキストを併せて格納する
+
+## 3. 実装箇所の整理
+
+- [x] `src/exstruct/core/shapes.py` に SmartArt 抽出用の関数を追加する(1 関数=1 責務を遵守)
+- [x] `src/exstruct/core/shapes.py` のメイン抽出処理で `Shape` / `Arrow` / `SmartArt` に振り分ける
+- [x] `src/exstruct/io/__init__.py` で `Shape | Arrow | SmartArt` のシリアライズ挙動が崩れないことを確認する
+
+## 4. 動作確認
+
+- [x] 既存の shape / connector 抽出が壊れていないことを確認する
+- [ ] SmartArt が含まれるブックで `SmartArt.nodes` が期待どおりに出力されることを確認する
+
+## 5. テストケース(カバレッジ維持)
+
+- [x] `SmartArt` の `nodes` がネスト構造でシリアライズされることを確認する
+- [x] `Arrow` のみが `begin_id` / `end_id` を持ち、`Shape` では参照されないことを確認する
+- [x] `_filter_shapes_to_area` が `Shape | Arrow | SmartArt` を受け取り、SmartArt も対象に含めることを確認する
+- [x] `kind` による判別が想定どおり動くことを確認する
diff --git a/docs/agents/TEST_REQUIREMENTS.md b/docs/agents/TEST_REQUIREMENTS.md
index 83cde0e..05dd053 100644
--- a/docs/agents/TEST_REQUIREMENTS.md
+++ b/docs/agents/TEST_REQUIREMENTS.md
@@ -1,6 +1,6 @@
# ExStruct テスト要件仕様書
-Version: 0.3
+Version: 0.4
Status: Required for Release
ExStruct の全機能について、正式なテスト要件をまとめたドキュメントです。AI エージェント/人間開発者が自動テスト・手動テストを設計するための基盤とします。
@@ -51,6 +51,7 @@ ExStruct の全機能について、正式なテスト要件をまとめたド
- [SHP-01] AutoShape の type を正規化
- [SHP-02] TextFrame を正しく取得
+- [SHP-02a] `type` は Shape のみ保持し、Arrow/SmartArt では出力しない
- [SHP-03] サイズ `w`,`h` は取得できない場合のみ null
- [SHP-04] グループ図形は展開方針を一貫させる
- [SHP-05] 座標 `l`,`t` は整数で取得しズームの影響を受けない
@@ -60,6 +61,13 @@ ExStruct の全機能について、正式なテスト要件をまとめたド
- [SHP-11] テキストなし図形は text=""
- [SHP-12] 複数段落のテキストも取得
+## 2.2.1 SmartArt 抽出
+
+- [SHP-SA-01] SmartArt は `layout` を必須で出力する
+- [SHP-SA-02] SmartArt のノードは `nodes` にネスト構造で出力する
+- [SHP-SA-03] ノードの子は `kids` で表現する(level は出力しない)
+- [SHP-SA-04] SmartArt が存在する場合は `kind="smartart"` で判別できる
+
## 2.3 矢印方向推定
- [DIR-01] 0° ±22.5° → "E"
diff --git a/docs/concept.md b/docs/concept.md
index f308ea8..cb9e739 100644
--- a/docs/concept.md
+++ b/docs/concept.md
@@ -108,7 +108,7 @@ For RAG and AI systems, this missing structure becomes a major bottleneck.
ExStruct outputs a unified structure containing:
- cells, rows, and sheets
-- shapes and text blocks
+- shapes, arrows, and SmartArt nodes (nested)
- chart series and metadata
- automatically detected table candidates
- layout geometry (positions, sizes)
diff --git a/docs/schemas.md b/docs/schemas.md
index e14f467..39c3573 100644
--- a/docs/schemas.md
+++ b/docs/schemas.md
@@ -11,6 +11,9 @@ repository to access the raw files.
- `schemas/sheet.json` — `SheetData`
- `schemas/cell_row.json` — `CellRow`
- `schemas/shape.json` — `Shape`
+- `schemas/arrow.json` `Arrow`
+- `schemas/smartart.json` `SmartArt`
+- `schemas/smartart_node.json` `SmartArtNode`
- `schemas/chart.json` — `Chart`
- `schemas/chart_series.json` — `ChartSeries`
- `schemas/print_area.json` — `PrintArea`
diff --git a/sample/basic/sample.json b/sample/basic/sample.json
index a2c72fb..d64a4d1 100644
--- a/sample/basic/sample.json
+++ b/sample/basic/sample.json
@@ -5,66 +5,31 @@
"rows": [
{
"r": 3,
- "c": {
- "1": "月",
- "2": "製品A",
- "3": "製品B",
- "4": "製品C"
- }
+ "c": { "1": "月", "2": "製品A", "3": "製品B", "4": "製品C" }
},
{
"r": 4,
- "c": {
- "1": "2025-01-01 00:00:00",
- "2": 120,
- "3": 80,
- "4": 60
- }
+ "c": { "1": "2025-01-01 00:00:00", "2": 120, "3": 80, "4": 60 }
},
{
"r": 5,
- "c": {
- "1": "2025-02-01 00:00:00",
- "2": 135,
- "3": 90,
- "4": 64
- }
+ "c": { "1": "2025-02-01 00:00:00", "2": 135, "3": 90, "4": 64 }
},
{
"r": 6,
- "c": {
- "1": "2025-03-01 00:00:00",
- "2": 150,
- "3": 100,
- "4": 70
- }
+ "c": { "1": "2025-03-01 00:00:00", "2": 150, "3": 100, "4": 70 }
},
{
"r": 7,
- "c": {
- "1": "2025-04-01 00:00:00",
- "2": 170,
- "3": 110,
- "4": 72
- }
+ "c": { "1": "2025-04-01 00:00:00", "2": 170, "3": 110, "4": 72 }
},
{
"r": 8,
- "c": {
- "1": "2025-05-01 00:00:00",
- "2": 160,
- "3": 120,
- "4": 75
- }
+ "c": { "1": "2025-05-01 00:00:00", "2": 160, "3": 120, "4": 75 }
},
{
"r": 9,
- "c": {
- "1": "2025-06-01 00:00:00",
- "2": 180,
- "3": 130,
- "4": 80
- }
+ "c": { "1": "2025-06-01 00:00:00", "2": 180, "3": 130, "4": 80 }
}
],
"shapes": [
@@ -73,6 +38,7 @@
"text": "開始",
"l": 148,
"t": 220,
+ "kind": "shape",
"type": "AutoShape-FlowchartProcess"
},
{
@@ -80,12 +46,13 @@
"text": "入力データ読み込み",
"l": 132,
"t": 282,
+ "kind": "shape",
"type": "AutoShape-FlowchartProcess"
},
{
"l": 193,
"t": 246,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 1,
@@ -97,6 +64,7 @@
"text": "形式は正しい?",
"l": 90,
"t": 342,
+ "kind": "shape",
"type": "AutoShape-FlowchartDecision"
},
{
@@ -104,6 +72,7 @@
"text": "1件処理",
"l": 424,
"t": 361,
+ "kind": "shape",
"type": "AutoShape-FlowchartProcess"
},
{
@@ -111,12 +80,13 @@
"text": "残件あり?",
"l": 365,
"t": 414,
+ "kind": "shape",
"type": "AutoShape-FlowchartDecision"
},
{
"l": 192,
"t": 312,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 2,
@@ -126,7 +96,7 @@
{
"l": 295,
"t": 374,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 3,
@@ -138,12 +108,13 @@
"text": "はい",
"l": 340,
"t": 362,
+ "kind": "shape",
"type": "TextBox-Rectangle"
},
{
"l": 468,
"t": 387,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 4,
@@ -155,6 +126,7 @@
"text": "出力を生成",
"l": 426,
"t": 494,
+ "kind": "shape",
"type": "AutoShape-FlowchartProcess"
},
{
@@ -162,6 +134,7 @@
"text": "メール送信?",
"l": 366,
"t": 549,
+ "kind": "shape",
"type": "AutoShape-FlowchartDecision"
},
{
@@ -169,12 +142,13 @@
"text": "エラー表示",
"l": 132,
"t": 463,
+ "kind": "shape",
"type": "AutoShape-FlowchartProcess"
},
{
"l": 192,
"t": 406,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 3,
@@ -186,12 +160,13 @@
"text": "メール送信",
"l": 426,
"t": 638,
+ "kind": "shape",
"type": "AutoShape-FlowchartProcess"
},
{
"l": 468,
"t": 466,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 5,
@@ -201,7 +176,7 @@
{
"l": 468,
"t": 520,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 7,
@@ -213,12 +188,13 @@
"text": "終了",
"l": 273,
"t": 684,
+ "kind": "shape",
"type": "AutoShape-FlowchartProcess"
},
{
"l": 194,
"t": 493,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 9,
@@ -228,7 +204,7 @@
{
"l": 363,
"t": 664,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 10,
@@ -238,7 +214,7 @@
{
"l": 468,
"t": 598,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 8,
@@ -250,12 +226,13 @@
"text": "はい",
"l": 448,
"t": 604,
+ "kind": "shape",
"type": "TextBox-Rectangle"
},
{
"l": 323,
"t": 573,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 8,
@@ -266,6 +243,7 @@
"text": "いいえ",
"l": 319,
"t": 600,
+ "kind": "shape",
"type": "TextBox-Rectangle"
}
],
@@ -274,10 +252,7 @@
"name": "Chart 1",
"chart_type": "Line",
"title": "売上データ",
- "y_axis_range": [
- 0.0,
- 200.0
- ],
+ "y_axis_range": [0.0, 200.0],
"series": [
{
"name": "製品A",
@@ -302,9 +277,7 @@
"t": 25
}
],
- "table_candidates": [
- "B3:E9"
- ]
+ "table_candidates": ["B3:E9"]
}
}
-}
\ No newline at end of file
+}
diff --git a/sample/flowchart/sample-shape-connector.json b/sample/flowchart/sample-shape-connector.json
index f1d0f90..b91c9b7 100644
--- a/sample/flowchart/sample-shape-connector.json
+++ b/sample/flowchart/sample-shape-connector.json
@@ -6,59 +6,65 @@
{
"id": 1,
"text": "S",
- "l": 81,
+ "l": 80,
"t": 45,
+ "kind": "shape",
"type": "AutoShape-Oval"
},
{
"id": 2,
"text": "E",
- "l": 549,
+ "l": 545,
"t": 696,
+ "kind": "shape",
"type": "AutoShape-Oval"
},
{
"id": 3,
"text": "要件抽出",
- "l": 81,
+ "l": 80,
"t": 168,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"l": 102,
- "t": 87,
- "type": "AutoShape-Mixed",
+ "t": 88,
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 1,
"end_id": 3,
- "direction": "NE"
+ "direction": "N"
},
{
"id": 4,
"text": "ヒアリング",
- "l": 342,
+ "l": 340,
"t": 97,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 5,
"text": "非機能要件",
- "l": 210,
- "t": 225,
+ "l": 209,
+ "t": 226,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 6,
"text": "機能要件",
- "l": 405,
+ "l": 402,
"t": 210,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
- "l": 191,
+ "l": 190,
"t": 120,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 3,
@@ -66,9 +72,9 @@
"direction": "NE"
},
{
- "l": 266,
+ "l": 264,
"t": 143,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 4,
@@ -76,9 +82,9 @@
"direction": "NE"
},
{
- "l": 398,
+ "l": 395,
"t": 143,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 4,
@@ -88,63 +94,71 @@
{
"id": 7,
"text": "プロトタイプ",
- "l": 381,
- "t": 291,
+ "l": 379,
+ "t": 292,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 8,
"text": "実験検証",
- "l": 388,
+ "l": 385,
"t": 389,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 9,
"text": "思考実験",
- "l": 82,
+ "l": 81,
"t": 325,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 10,
"text": "再検証",
- "l": 182,
+ "l": 181,
"t": 426,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 11,
"text": "まとめ",
- "l": 252,
- "t": 510,
+ "l": 251,
+ "t": 511,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 12,
"text": "文書作成",
- "l": 296,
+ "l": 294,
"t": 589,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 13,
"text": "契約管理",
- "l": 489,
+ "l": 486,
"t": 509,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
"id": 14,
"text": "締結",
- "l": 356,
- "t": 675,
+ "l": 353,
+ "t": 676,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
- "l": 144,
+ "l": 143,
"t": 271,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 5,
@@ -152,9 +166,9 @@
"direction": "NE"
},
{
- "l": 144,
+ "l": 143,
"t": 371,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 9,
@@ -162,9 +176,9 @@
"direction": "NE"
},
{
- "l": 244,
+ "l": 242,
"t": 471,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 10,
@@ -172,9 +186,9 @@
"direction": "NE"
},
{
- "l": 314,
+ "l": 312,
"t": 556,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 11,
@@ -182,9 +196,9 @@
"direction": "NE"
},
{
- "l": 376,
+ "l": 373,
"t": 531,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 11,
@@ -192,9 +206,9 @@
"direction": "E"
},
{
- "l": 357,
+ "l": 355,
"t": 635,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 12,
@@ -202,9 +216,9 @@
"direction": "NE"
},
{
- "l": 417,
+ "l": 414,
"t": 554,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 13,
@@ -212,9 +226,9 @@
"direction": "NE"
},
{
- "l": 479,
+ "l": 476,
"t": 698,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 14,
@@ -222,9 +236,9 @@
"direction": "E"
},
{
- "l": 443,
- "t": 255,
- "type": "AutoShape-Mixed",
+ "l": 440,
+ "t": 256,
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 6,
@@ -232,9 +246,9 @@
"direction": "NE"
},
{
- "l": 443,
+ "l": 440,
"t": 337,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 7,
@@ -242,9 +256,9 @@
"direction": "N"
},
{
- "l": 314,
+ "l": 312,
"t": 434,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 8,
@@ -252,10 +266,10 @@
"direction": "NE"
},
{
- "l": 194,
+ "l": 192,
"t": 298,
- "type": "AutoShape-Mixed",
"rotation": 90.0,
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 10,
@@ -263,9 +277,9 @@
"direction": "NE"
},
{
- "l": 511,
+ "l": 508,
"t": 308,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 8,
@@ -275,14 +289,15 @@
{
"id": 15,
"text": "機能追加",
- "l": 581,
+ "l": 577,
"t": 263,
+ "kind": "shape",
"type": "AutoShape-Rectangle"
},
{
- "l": 505,
+ "l": 501,
"t": 285,
- "type": "AutoShape-Mixed",
+ "kind": "arrow",
"begin_arrow_style": 1,
"end_arrow_style": 2,
"begin_id": 15,
@@ -292,4 +307,4 @@
]
}
}
-}
\ No newline at end of file
+}
diff --git a/sample/smartart/sample_smartart.json b/sample/smartart/sample_smartart.json
new file mode 100644
index 0000000..07f112a
--- /dev/null
+++ b/sample/smartart/sample_smartart.json
@@ -0,0 +1,93 @@
+{
+ "book_name": "sample_smartart.xlsx",
+ "sheets": {
+ "Sheet1": {
+ "shapes": [
+ {
+ "id": 1,
+ "l": 0,
+ "t": 28,
+ "kind": "smartart",
+ "layout": "基本の循環",
+ "nodes": [
+ { "text": "1", "kids": [{ "text": "要件定義" }] },
+ { "text": "2", "kids": [{ "text": "報連相" }, { "text": "開発" }] },
+ {
+ "text": "3",
+ "kids": [{ "text": "実装確認" }, { "text": "動作確認" }]
+ },
+ { "text": "4", "kids": [{ "text": "対策" }] },
+ { "text": "5", "kids": [{ "text": "最終確認" }] }
+ ]
+ },
+ {
+ "id": 2,
+ "l": 388,
+ "t": 32,
+ "kind": "smartart",
+ "layout": "開始点強調型プロセス",
+ "nodes": [
+ { "text": "企画" },
+ { "text": "執筆" },
+ { "text": "編集" },
+ { "text": "制作" },
+ { "text": "校正" }
+ ]
+ },
+ {
+ "id": 3,
+ "l": 46,
+ "t": 325,
+ "kind": "smartart",
+ "layout": "組織図",
+ "nodes": [
+ {
+ "text": "取締役会",
+ "kids": [
+ {
+ "text": "社長",
+ "kids": [
+ { "text": "企画管理部" },
+ {
+ "text": "営業部",
+ "kids": [
+ { "text": "第1営業課" },
+ { "text": "第2営業課" },
+ { "text": "第3営業課" },
+ { "text": "海外営業課" }
+ ]
+ },
+ {
+ "text": "開発部",
+ "kids": [{ "text": "第1開発課" }, { "text": "第2開発課" }]
+ },
+ {
+ "text": "技術部",
+ "kids": [{ "text": "第1技術課" }, { "text": "第2技術課" }]
+ },
+ {
+ "text": "生産部",
+ "kids": [
+ { "text": "愛知工場" },
+ { "text": "山形工場" },
+ { "text": "高知工場" }
+ ]
+ },
+ {
+ "text": "総務部",
+ "kids": [
+ { "text": "総務課" },
+ { "text": "人事課" },
+ { "text": "経理課" }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+ }
+}
diff --git a/sample/smartart/sample_smartart.xlsx b/sample/smartart/sample_smartart.xlsx
new file mode 100644
index 0000000..7812f7c
Binary files /dev/null and b/sample/smartart/sample_smartart.xlsx differ
diff --git a/sample/smartart/sample_smartart_for_llm.md b/sample/smartart/sample_smartart_for_llm.md
new file mode 100644
index 0000000..b453d06
--- /dev/null
+++ b/sample/smartart/sample_smartart_for_llm.md
@@ -0,0 +1,92 @@
+# 📘 sample_smartart.xlsx
+
+## 1. 基本の循環(SmartArt)
+
+- **1**
+ - 要件定義
+- **2**
+ - 報連相
+ - 開発
+- **3**
+ - 実装確認
+ - 動作確認
+- **4**
+ - 対策
+- **5**
+ - 最終確認
+
+---
+
+## 2. 開始点強調型プロセス(SmartArt)
+
+1. 企画
+2. 執筆
+3. 編集
+4. 制作
+5. 校正
+
+```mermaid
+flowchart LR
+ B1["企画"] --> B2["執筆"] --> B3["編集"] --> B4["制作"] --> B5["校正"]
+```
+
+---
+
+## 3. 組織図(SmartArt)
+
+- **取締役会**
+ - **社長**
+ - 企画管理部
+ - 営業部
+ - 第 1 営業課
+ - 第 2 営業課
+ - 第 3 営業課
+ - 海外営業課
+ - 開発部
+ - 第 1 開発課
+ - 第 2 開発課
+ - 技術部
+ - 第 1 技術課
+ - 第 2 技術課
+ - 生産部
+ - 愛知工場
+ - 山形工場
+ - 高知工場
+ - 総務部
+ - 総務課
+ - 人事課
+ - 経理課
+
+```mermaid
+flowchart TB
+ T["取締役会"]
+ P["社長"]
+
+ T --> P
+
+ P --> K1["企画管理部"]
+
+ P --> E["営業部"]
+ E --> E1["第1営業課"]
+ E --> E2["第2営業課"]
+ E --> E3["第3営業課"]
+ E --> E4["海外営業課"]
+
+ P --> D["開発部"]
+ D --> D1["第1開発課"]
+ D --> D2["第2開発課"]
+
+ P --> G["技術部"]
+ G --> G1["第1技術課"]
+ G --> G2["第2技術課"]
+
+ P --> S["生産部"]
+ S --> S1["愛知工場"]
+ S --> S2["山形工場"]
+ S --> S3["高知工場"]
+
+ P --> A["総務部"]
+ A --> A1["総務課"]
+ A --> A2["人事課"]
+ A --> A3["経理課"]
+```
diff --git a/schemas/arrow.json b/schemas/arrow.json
new file mode 100644
index 0000000..d2ef8f1
--- /dev/null
+++ b/schemas/arrow.json
@@ -0,0 +1,162 @@
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "description": "Connector shape metadata.",
+ "properties": {
+ "begin_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the start of a connector.",
+ "title": "Begin Arrow Style"
+ },
+ "begin_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
+ "title": "Begin Id"
+ },
+ "direction": {
+ "anyOf": [
+ {
+ "enum": [
+ "E",
+ "SE",
+ "S",
+ "SW",
+ "W",
+ "NW",
+ "N",
+ "NE"
+ ],
+ "type": "string"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Connector direction (compass heading).",
+ "title": "Direction"
+ },
+ "end_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the end of a connector.",
+ "title": "End Arrow Style"
+ },
+ "end_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
+ "title": "End Id"
+ },
+ "h": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape height (None if unknown).",
+ "title": "H"
+ },
+ "id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
+ },
+ "kind": {
+ "const": "arrow",
+ "default": "arrow",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "rotation": {
+ "anyOf": [
+ {
+ "type": "number"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
+ },
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "w": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t"
+ ],
+ "title": "Arrow",
+ "type": "object"
+}
\ No newline at end of file
diff --git a/schemas/print_area_view.json b/schemas/print_area_view.json
index d718773..38b57e2 100644
--- a/schemas/print_area_view.json
+++ b/schemas/print_area_view.json
@@ -1,5 +1,166 @@
{
"$defs": {
+ "Arrow": {
+ "description": "Connector shape metadata.",
+ "properties": {
+ "begin_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the start of a connector.",
+ "title": "Begin Arrow Style"
+ },
+ "begin_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
+ "title": "Begin Id"
+ },
+ "direction": {
+ "anyOf": [
+ {
+ "enum": [
+ "E",
+ "SE",
+ "S",
+ "SW",
+ "W",
+ "NW",
+ "N",
+ "NE"
+ ],
+ "type": "string"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Connector direction (compass heading).",
+ "title": "Direction"
+ },
+ "end_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the end of a connector.",
+ "title": "End Arrow Style"
+ },
+ "end_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
+ "title": "End Id"
+ },
+ "h": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape height (None if unknown).",
+ "title": "H"
+ },
+ "id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
+ },
+ "kind": {
+ "const": "arrow",
+ "default": "arrow",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "rotation": {
+ "anyOf": [
+ {
+ "type": "number"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
+ },
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "w": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t"
+ ],
+ "title": "Arrow",
+ "type": "object"
+ },
"CellRow": {
"description": "A single row of cells with optional hyperlinks.",
"properties": {
@@ -246,9 +407,9 @@
"type": "object"
},
"Shape": {
- "description": "Shape metadata (position, size, text, and styling).",
+ "description": "Normal shape metadata.",
"properties": {
- "begin_arrow_style": {
+ "h": {
"anyOf": [
{
"type": "integer"
@@ -258,10 +419,10 @@
}
],
"default": null,
- "description": "Arrow style enum for the start of a connector.",
- "title": "Begin Arrow Style"
+ "description": "Shape height (None if unknown).",
+ "title": "H"
},
- "begin_id": {
+ "id": {
"anyOf": [
{
"type": "integer"
@@ -271,46 +432,58 @@
}
],
"default": null,
- "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
- "title": "Begin Id"
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
},
- "direction": {
+ "kind": {
+ "const": "shape",
+ "default": "shape",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "rotation": {
"anyOf": [
{
- "enum": [
- "E",
- "SE",
- "S",
- "SW",
- "W",
- "NW",
- "N",
- "NE"
- ],
- "type": "string"
+ "type": "number"
},
{
"type": "null"
}
],
"default": null,
- "description": "Connector direction (compass heading).",
- "title": "Direction"
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
},
- "end_arrow_style": {
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "type": {
"anyOf": [
{
- "type": "integer"
+ "type": "string"
},
{
"type": "null"
}
],
"default": null,
- "description": "Arrow style enum for the end of a connector.",
- "title": "End Arrow Style"
+ "description": "Excel shape type name.",
+ "title": "Type"
},
- "end_id": {
+ "w": {
"anyOf": [
{
"type": "integer"
@@ -320,9 +493,21 @@
}
],
"default": null,
- "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
- "title": "End Id"
- },
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t"
+ ],
+ "title": "Shape",
+ "type": "object"
+ },
+ "SmartArt": {
+ "description": "SmartArt shape metadata with nested nodes.",
+ "properties": {
"h": {
"anyOf": [
{
@@ -349,11 +534,31 @@
"description": "Sequential shape id within the sheet (if applicable).",
"title": "Id"
},
+ "kind": {
+ "const": "smartart",
+ "default": "smartart",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
"l": {
"description": "Left offset (Excel units).",
"title": "L",
"type": "integer"
},
+ "layout": {
+ "description": "SmartArt layout name.",
+ "title": "Layout",
+ "type": "string"
+ },
+ "nodes": {
+ "description": "Root nodes of SmartArt tree.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Nodes",
+ "type": "array"
+ },
"rotation": {
"anyOf": [
{
@@ -377,19 +582,6 @@
"title": "Text",
"type": "string"
},
- "type": {
- "anyOf": [
- {
- "type": "string"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Excel shape type name.",
- "title": "Type"
- },
"w": {
"anyOf": [
{
@@ -407,9 +599,33 @@
"required": [
"text",
"l",
- "t"
+ "t",
+ "layout"
],
- "title": "Shape",
+ "title": "SmartArt",
+ "type": "object"
+ },
+ "SmartArtNode": {
+ "description": "Node of SmartArt hierarchy.",
+ "properties": {
+ "kids": {
+ "description": "Child nodes.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Kids",
+ "type": "array"
+ },
+ "text": {
+ "description": "Visible text for the node.",
+ "title": "Text",
+ "type": "string"
+ }
+ },
+ "required": [
+ "text"
+ ],
+ "title": "SmartArtNode",
"type": "object"
}
},
@@ -444,7 +660,17 @@
"shapes": {
"description": "Shapes overlapping the area.",
"items": {
- "$ref": "#/$defs/Shape"
+ "anyOf": [
+ {
+ "$ref": "#/$defs/Shape"
+ },
+ {
+ "$ref": "#/$defs/Arrow"
+ },
+ {
+ "$ref": "#/$defs/SmartArt"
+ }
+ ]
},
"title": "Shapes",
"type": "array"
diff --git a/schemas/shape.json b/schemas/shape.json
index dff32d0..8f76162 100644
--- a/schemas/shape.json
+++ b/schemas/shape.json
@@ -1,82 +1,7 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
- "description": "Shape metadata (position, size, text, and styling).",
+ "description": "Normal shape metadata.",
"properties": {
- "begin_arrow_style": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Arrow style enum for the start of a connector.",
- "title": "Begin Arrow Style"
- },
- "begin_id": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
- "title": "Begin Id"
- },
- "direction": {
- "anyOf": [
- {
- "enum": [
- "E",
- "SE",
- "S",
- "SW",
- "W",
- "NW",
- "N",
- "NE"
- ],
- "type": "string"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Connector direction (compass heading).",
- "title": "Direction"
- },
- "end_arrow_style": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Arrow style enum for the end of a connector.",
- "title": "End Arrow Style"
- },
- "end_id": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
- "title": "End Id"
- },
"h": {
"anyOf": [
{
@@ -103,6 +28,13 @@
"description": "Sequential shape id within the sheet (if applicable).",
"title": "Id"
},
+ "kind": {
+ "const": "shape",
+ "default": "shape",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
"l": {
"description": "Left offset (Excel units).",
"title": "L",
diff --git a/schemas/sheet.json b/schemas/sheet.json
index fff9dfc..9ea5497 100644
--- a/schemas/sheet.json
+++ b/schemas/sheet.json
@@ -1,5 +1,166 @@
{
"$defs": {
+ "Arrow": {
+ "description": "Connector shape metadata.",
+ "properties": {
+ "begin_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the start of a connector.",
+ "title": "Begin Arrow Style"
+ },
+ "begin_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
+ "title": "Begin Id"
+ },
+ "direction": {
+ "anyOf": [
+ {
+ "enum": [
+ "E",
+ "SE",
+ "S",
+ "SW",
+ "W",
+ "NW",
+ "N",
+ "NE"
+ ],
+ "type": "string"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Connector direction (compass heading).",
+ "title": "Direction"
+ },
+ "end_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the end of a connector.",
+ "title": "End Arrow Style"
+ },
+ "end_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
+ "title": "End Id"
+ },
+ "h": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape height (None if unknown).",
+ "title": "H"
+ },
+ "id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
+ },
+ "kind": {
+ "const": "arrow",
+ "default": "arrow",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "rotation": {
+ "anyOf": [
+ {
+ "type": "number"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
+ },
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "w": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t"
+ ],
+ "title": "Arrow",
+ "type": "object"
+ },
"CellRow": {
"description": "A single row of cells with optional hyperlinks.",
"properties": {
@@ -246,9 +407,9 @@
"type": "object"
},
"Shape": {
- "description": "Shape metadata (position, size, text, and styling).",
+ "description": "Normal shape metadata.",
"properties": {
- "begin_arrow_style": {
+ "h": {
"anyOf": [
{
"type": "integer"
@@ -258,10 +419,10 @@
}
],
"default": null,
- "description": "Arrow style enum for the start of a connector.",
- "title": "Begin Arrow Style"
+ "description": "Shape height (None if unknown).",
+ "title": "H"
},
- "begin_id": {
+ "id": {
"anyOf": [
{
"type": "integer"
@@ -271,46 +432,58 @@
}
],
"default": null,
- "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
- "title": "Begin Id"
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
},
- "direction": {
+ "kind": {
+ "const": "shape",
+ "default": "shape",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "rotation": {
"anyOf": [
{
- "enum": [
- "E",
- "SE",
- "S",
- "SW",
- "W",
- "NW",
- "N",
- "NE"
- ],
- "type": "string"
+ "type": "number"
},
{
"type": "null"
}
],
"default": null,
- "description": "Connector direction (compass heading).",
- "title": "Direction"
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
},
- "end_arrow_style": {
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "type": {
"anyOf": [
{
- "type": "integer"
+ "type": "string"
},
{
"type": "null"
}
],
"default": null,
- "description": "Arrow style enum for the end of a connector.",
- "title": "End Arrow Style"
+ "description": "Excel shape type name.",
+ "title": "Type"
},
- "end_id": {
+ "w": {
"anyOf": [
{
"type": "integer"
@@ -320,9 +493,21 @@
}
],
"default": null,
- "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
- "title": "End Id"
- },
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t"
+ ],
+ "title": "Shape",
+ "type": "object"
+ },
+ "SmartArt": {
+ "description": "SmartArt shape metadata with nested nodes.",
+ "properties": {
"h": {
"anyOf": [
{
@@ -349,11 +534,31 @@
"description": "Sequential shape id within the sheet (if applicable).",
"title": "Id"
},
+ "kind": {
+ "const": "smartart",
+ "default": "smartart",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
"l": {
"description": "Left offset (Excel units).",
"title": "L",
"type": "integer"
},
+ "layout": {
+ "description": "SmartArt layout name.",
+ "title": "Layout",
+ "type": "string"
+ },
+ "nodes": {
+ "description": "Root nodes of SmartArt tree.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Nodes",
+ "type": "array"
+ },
"rotation": {
"anyOf": [
{
@@ -377,19 +582,6 @@
"title": "Text",
"type": "string"
},
- "type": {
- "anyOf": [
- {
- "type": "string"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Excel shape type name.",
- "title": "Type"
- },
"w": {
"anyOf": [
{
@@ -407,9 +599,33 @@
"required": [
"text",
"l",
- "t"
+ "t",
+ "layout"
],
- "title": "Shape",
+ "title": "SmartArt",
+ "type": "object"
+ },
+ "SmartArtNode": {
+ "description": "Node of SmartArt hierarchy.",
+ "properties": {
+ "kids": {
+ "description": "Child nodes.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Kids",
+ "type": "array"
+ },
+ "text": {
+ "description": "Visible text for the node.",
+ "title": "Text",
+ "type": "string"
+ }
+ },
+ "required": [
+ "text"
+ ],
+ "title": "SmartArtNode",
"type": "object"
}
},
@@ -472,7 +688,17 @@
"shapes": {
"description": "Shapes detected on the sheet.",
"items": {
- "$ref": "#/$defs/Shape"
+ "anyOf": [
+ {
+ "$ref": "#/$defs/Shape"
+ },
+ {
+ "$ref": "#/$defs/Arrow"
+ },
+ {
+ "$ref": "#/$defs/SmartArt"
+ }
+ ]
},
"title": "Shapes",
"type": "array"
diff --git a/schemas/smartart.json b/schemas/smartart.json
new file mode 100644
index 0000000..68d1cab
--- /dev/null
+++ b/schemas/smartart.json
@@ -0,0 +1,126 @@
+{
+ "$defs": {
+ "SmartArtNode": {
+ "description": "Node of SmartArt hierarchy.",
+ "properties": {
+ "kids": {
+ "description": "Child nodes.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Kids",
+ "type": "array"
+ },
+ "text": {
+ "description": "Visible text for the node.",
+ "title": "Text",
+ "type": "string"
+ }
+ },
+ "required": [
+ "text"
+ ],
+ "title": "SmartArtNode",
+ "type": "object"
+ }
+ },
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "description": "SmartArt shape metadata with nested nodes.",
+ "properties": {
+ "h": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape height (None if unknown).",
+ "title": "H"
+ },
+ "id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
+ },
+ "kind": {
+ "const": "smartart",
+ "default": "smartart",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "layout": {
+ "description": "SmartArt layout name.",
+ "title": "Layout",
+ "type": "string"
+ },
+ "nodes": {
+ "description": "Root nodes of SmartArt tree.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Nodes",
+ "type": "array"
+ },
+ "rotation": {
+ "anyOf": [
+ {
+ "type": "number"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
+ },
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "w": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t",
+ "layout"
+ ],
+ "title": "SmartArt",
+ "type": "object"
+}
\ No newline at end of file
diff --git a/schemas/smartart_node.json b/schemas/smartart_node.json
new file mode 100644
index 0000000..109b7b7
--- /dev/null
+++ b/schemas/smartart_node.json
@@ -0,0 +1,29 @@
+{
+ "$defs": {
+ "SmartArtNode": {
+ "description": "Node of SmartArt hierarchy.",
+ "properties": {
+ "kids": {
+ "description": "Child nodes.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Kids",
+ "type": "array"
+ },
+ "text": {
+ "description": "Visible text for the node.",
+ "title": "Text",
+ "type": "string"
+ }
+ },
+ "required": [
+ "text"
+ ],
+ "title": "SmartArtNode",
+ "type": "object"
+ }
+ },
+ "$ref": "#/$defs/SmartArtNode",
+ "$schema": "https://json-schema.org/draft/2020-12/schema"
+}
\ No newline at end of file
diff --git a/schemas/workbook.json b/schemas/workbook.json
index 4fac8d1..12ab273 100644
--- a/schemas/workbook.json
+++ b/schemas/workbook.json
@@ -1,5 +1,166 @@
{
"$defs": {
+ "Arrow": {
+ "description": "Connector shape metadata.",
+ "properties": {
+ "begin_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the start of a connector.",
+ "title": "Begin Arrow Style"
+ },
+ "begin_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
+ "title": "Begin Id"
+ },
+ "direction": {
+ "anyOf": [
+ {
+ "enum": [
+ "E",
+ "SE",
+ "S",
+ "SW",
+ "W",
+ "NW",
+ "N",
+ "NE"
+ ],
+ "type": "string"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Connector direction (compass heading).",
+ "title": "Direction"
+ },
+ "end_arrow_style": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Arrow style enum for the end of a connector.",
+ "title": "End Arrow Style"
+ },
+ "end_id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
+ "title": "End Id"
+ },
+ "h": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape height (None if unknown).",
+ "title": "H"
+ },
+ "id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
+ },
+ "kind": {
+ "const": "arrow",
+ "default": "arrow",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "rotation": {
+ "anyOf": [
+ {
+ "type": "number"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
+ },
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "w": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t"
+ ],
+ "title": "Arrow",
+ "type": "object"
+ },
"CellRow": {
"description": "A single row of cells with optional hyperlinks.",
"properties": {
@@ -246,83 +407,8 @@
"type": "object"
},
"Shape": {
- "description": "Shape metadata (position, size, text, and styling).",
+ "description": "Normal shape metadata.",
"properties": {
- "begin_arrow_style": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Arrow style enum for the start of a connector.",
- "title": "Begin Arrow Style"
- },
- "begin_id": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
- "title": "Begin Id"
- },
- "direction": {
- "anyOf": [
- {
- "enum": [
- "E",
- "SE",
- "S",
- "SW",
- "W",
- "NW",
- "N",
- "NE"
- ],
- "type": "string"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Connector direction (compass heading).",
- "title": "Direction"
- },
- "end_arrow_style": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Arrow style enum for the end of a connector.",
- "title": "End Arrow Style"
- },
- "end_id": {
- "anyOf": [
- {
- "type": "integer"
- },
- {
- "type": "null"
- }
- ],
- "default": null,
- "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
- "title": "End Id"
- },
"h": {
"anyOf": [
{
@@ -349,6 +435,13 @@
"description": "Sequential shape id within the sheet (if applicable).",
"title": "Id"
},
+ "kind": {
+ "const": "shape",
+ "default": "shape",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
"l": {
"description": "Left offset (Excel units).",
"title": "L",
@@ -471,7 +564,17 @@
"shapes": {
"description": "Shapes detected on the sheet.",
"items": {
- "$ref": "#/$defs/Shape"
+ "anyOf": [
+ {
+ "$ref": "#/$defs/Shape"
+ },
+ {
+ "$ref": "#/$defs/Arrow"
+ },
+ {
+ "$ref": "#/$defs/SmartArt"
+ }
+ ]
},
"title": "Shapes",
"type": "array"
@@ -487,6 +590,129 @@
},
"title": "SheetData",
"type": "object"
+ },
+ "SmartArt": {
+ "description": "SmartArt shape metadata with nested nodes.",
+ "properties": {
+ "h": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape height (None if unknown).",
+ "title": "H"
+ },
+ "id": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Sequential shape id within the sheet (if applicable).",
+ "title": "Id"
+ },
+ "kind": {
+ "const": "smartart",
+ "default": "smartart",
+ "description": "Shape kind.",
+ "title": "Kind",
+ "type": "string"
+ },
+ "l": {
+ "description": "Left offset (Excel units).",
+ "title": "L",
+ "type": "integer"
+ },
+ "layout": {
+ "description": "SmartArt layout name.",
+ "title": "Layout",
+ "type": "string"
+ },
+ "nodes": {
+ "description": "Root nodes of SmartArt tree.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Nodes",
+ "type": "array"
+ },
+ "rotation": {
+ "anyOf": [
+ {
+ "type": "number"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Rotation angle in degrees.",
+ "title": "Rotation"
+ },
+ "t": {
+ "description": "Top offset (Excel units).",
+ "title": "T",
+ "type": "integer"
+ },
+ "text": {
+ "description": "Visible text content of the shape.",
+ "title": "Text",
+ "type": "string"
+ },
+ "w": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "default": null,
+ "description": "Shape width (None if unknown).",
+ "title": "W"
+ }
+ },
+ "required": [
+ "text",
+ "l",
+ "t",
+ "layout"
+ ],
+ "title": "SmartArt",
+ "type": "object"
+ },
+ "SmartArtNode": {
+ "description": "Node of SmartArt hierarchy.",
+ "properties": {
+ "kids": {
+ "description": "Child nodes.",
+ "items": {
+ "$ref": "#/$defs/SmartArtNode"
+ },
+ "title": "Kids",
+ "type": "array"
+ },
+ "text": {
+ "description": "Visible text for the node.",
+ "title": "Text",
+ "type": "string"
+ }
+ },
+ "required": [
+ "text"
+ ],
+ "title": "SmartArtNode",
+ "type": "object"
}
},
"$schema": "https://json-schema.org/draft/2020-12/schema",
diff --git a/scripts/gen_json_schema.py b/scripts/gen_json_schema.py
index b230b05..a848939 100644
--- a/scripts/gen_json_schema.py
+++ b/scripts/gen_json_schema.py
@@ -6,6 +6,7 @@
from pydantic import BaseModel
from exstruct.models import (
+ Arrow,
CellRow,
Chart,
ChartSeries,
@@ -13,6 +14,8 @@
PrintAreaView,
Shape,
SheetData,
+ SmartArt,
+ SmartArtNode,
WorkbookData,
)
@@ -44,6 +47,9 @@ def main() -> int:
"sheet": SheetData,
"cell_row": CellRow,
"shape": Shape,
+ "arrow": Arrow,
+ "smartart": SmartArt,
+ "smartart_node": SmartArtNode,
"chart": Chart,
"chart_series": ChartSeries,
"print_area": PrintArea,
diff --git a/src/exstruct/core/modeling.py b/src/exstruct/core/modeling.py
index 475e036..2b312e8 100644
--- a/src/exstruct/core/modeling.py
+++ b/src/exstruct/core/modeling.py
@@ -2,7 +2,16 @@
from dataclasses import dataclass
-from ..models import CellRow, Chart, PrintArea, Shape, SheetData, WorkbookData
+from ..models import (
+ Arrow,
+ CellRow,
+ Chart,
+ PrintArea,
+ Shape,
+ SheetData,
+ SmartArt,
+ WorkbookData,
+)
@dataclass(frozen=True)
@@ -20,7 +29,7 @@ class SheetRawData:
"""
rows: list[CellRow]
- shapes: list[Shape]
+ shapes: list[Shape | Arrow | SmartArt]
charts: list[Chart]
table_candidates: list[str]
print_areas: list[PrintArea]
diff --git a/src/exstruct/core/pipeline.py b/src/exstruct/core/pipeline.py
index 3258da9..9dfcc04 100644
--- a/src/exstruct/core/pipeline.py
+++ b/src/exstruct/core/pipeline.py
@@ -10,7 +10,7 @@
import xlwings as xw
from ..errors import FallbackReason
-from ..models import CellRow, Chart, PrintArea, Shape, WorkbookData
+from ..models import Arrow, CellRow, Chart, PrintArea, Shape, SmartArt, WorkbookData
from .backends.com_backend import ComBackend
from .backends.openpyxl_backend import OpenpyxlBackend
from .cells import WorkbookColorsMap, detect_tables
@@ -23,7 +23,7 @@
ExtractionMode = Literal["light", "standard", "verbose"]
CellData = dict[str, list[CellRow]]
PrintAreaData = dict[str, list[PrintArea]]
-ShapeData = dict[str, list[Shape]]
+ShapeData = dict[str, list[Shape | Arrow | SmartArt]]
ChartData = dict[str, list[Chart]]
logger = logging.getLogger(__name__)
diff --git a/src/exstruct/core/shapes.py b/src/exstruct/core/shapes.py
index 02b2257..1831937 100644
--- a/src/exstruct/core/shapes.py
+++ b/src/exstruct/core/shapes.py
@@ -1,13 +1,13 @@
from __future__ import annotations
-from collections.abc import Iterator
+from collections.abc import Iterable, Iterator
import math
-from typing import SupportsInt, cast
+from typing import Literal, Protocol, SupportsInt, cast, runtime_checkable
import xlwings as xw
from xlwings import Book
-from ..models import Shape
+from ..models import Arrow, Shape, SmartArt, SmartArtNode
from ..models.maps import MSO_AUTO_SHAPE_TYPE_MAP, MSO_SHAPE_TYPE_MAP
@@ -16,11 +16,13 @@ def compute_line_angle_deg(w: float, h: float) -> float:
return math.degrees(math.atan2(h, w)) % 360.0
-def angle_to_compass(angle: float) -> str:
+def angle_to_compass(
+ angle: float,
+) -> Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"]:
"""Convert angle to 8-point compass direction (0deg=E, 45deg=NE, 90deg=N, etc)."""
dirs = ["E", "NE", "N", "NW", "W", "SW", "S", "SE"]
idx = int(((angle + 22.5) % 360) // 45)
- return dirs[idx]
+ return cast(Literal["E", "SE", "S", "SW", "W", "NW", "N", "NE"], dirs[idx])
def coord_to_cell_by_edges(
@@ -108,16 +110,129 @@ def _should_include_shape(
return True
+@runtime_checkable
+class _TextRangeLike(Protocol):
+ """Text range interface for SmartArt nodes."""
+
+ Text: str | None
+
+
+@runtime_checkable
+class _TextFrameLike(Protocol):
+ """Text frame interface for SmartArt nodes."""
+
+ HasText: bool
+ TextRange: _TextRangeLike
+
+
+@runtime_checkable
+class _SmartArtNodeLike(Protocol):
+ """SmartArt node interface."""
+
+ Level: int
+ TextFrame2: _TextFrameLike
+
+
+@runtime_checkable
+class _SmartArtLike(Protocol):
+ """SmartArt interface."""
+
+ Layout: object
+ AllNodes: Iterable[_SmartArtNodeLike]
+
+
+def _shape_has_smartart(shp: xw.Shape) -> bool:
+ """Return True if the shape exposes SmartArt content."""
+ try:
+ api = shp.api
+ except Exception:
+ return False
+ try:
+ return bool(api.HasSmartArt)
+ except Exception:
+ return False
+
+
+def _get_smartart_layout_name(smartart: _SmartArtLike | None) -> str:
+ """Return SmartArt layout name or a fallback label."""
+ if smartart is None:
+ return "Unknown"
+ try:
+ layout = getattr(smartart, "Layout", None)
+ name = getattr(layout, "Name", None)
+ return str(name) if name is not None else "Unknown"
+ except Exception:
+ return "Unknown"
+
+
+def _collect_smartart_node_info(
+ smartart: _SmartArtLike | None,
+) -> list[tuple[int, str]]:
+ """Collect (level, text) pairs from SmartArt nodes."""
+ nodes_info: list[tuple[int, str]] = []
+ if smartart is None:
+ return nodes_info
+ try:
+ all_nodes = smartart.AllNodes
+ except Exception:
+ return nodes_info
+
+ for node in all_nodes:
+ level = _get_smartart_node_level(node)
+ if level is None:
+ continue
+ text = ""
+ try:
+ text_frame = node.TextFrame2
+ if text_frame.HasText:
+ text_value = text_frame.TextRange.Text
+ text = str(text_value) if text_value is not None else ""
+ except Exception:
+ text = ""
+ nodes_info.append((level, text))
+ return nodes_info
+
+
+def _get_smartart_node_level(node: _SmartArtNodeLike) -> int | None:
+ """Return SmartArt node level or None when unavailable."""
+ try:
+ return int(node.Level)
+ except Exception:
+ return None
+
+
+def _build_smartart_tree(nodes_info: list[tuple[int, str]]) -> list[SmartArtNode]:
+ """Build nested SmartArtNode roots from flat (level, text) tuples."""
+ roots: list[SmartArtNode] = []
+ stack: list[tuple[int, SmartArtNode]] = []
+ for level, text in nodes_info:
+ node = SmartArtNode(text=text, kids=[])
+ while stack and stack[-1][0] >= level:
+ stack.pop()
+ if stack:
+ stack[-1][1].kids.append(node)
+ else:
+ roots.append(node)
+ stack.append((level, node))
+ return roots
+
+
+def _extract_smartart_nodes(smartart: _SmartArtLike | None) -> list[SmartArtNode]:
+ """Extract SmartArt nodes as nested roots."""
+ nodes_info = _collect_smartart_node_info(smartart)
+ return _build_smartart_tree(nodes_info)
+
+
def get_shapes_with_position( # noqa: C901
workbook: Book, mode: str = "standard"
-) -> dict[str, list[Shape]]:
- """Scan shapes in a workbook and return per-sheet Shape lists with position info."""
- shape_data: dict[str, list[Shape]] = {}
+) -> dict[str, list[Shape | Arrow | SmartArt]]:
+ """Scan shapes in a workbook and return per-sheet shape lists with position info."""
+ shape_data: dict[str, list[Shape | Arrow | SmartArt]] = {}
for sheet in workbook.sheets:
- shapes: list[Shape] = []
+ shapes: list[Shape | Arrow | SmartArt] = []
excel_names: list[tuple[str, int]] = []
node_index = 0
- pending_connections: list[tuple[Shape, str | None, str | None]] = []
+ pending_connections: list[tuple[Arrow, str | None, str | None]] = []
for root in sheet.shapes:
for shp in iter_shapes_recursive(root):
try:
@@ -148,7 +263,11 @@ def get_shapes_with_position( # noqa: C901
except Exception:
text = ""
- if not _should_include_shape(
+ if mode == "light":
+ continue
+
+ has_smartart = _shape_has_smartart(shp)
+ if not has_smartart and not _should_include_shape(
text=text,
shape_type_num=type_num,
shape_type_str=shape_type_str,
@@ -179,7 +298,8 @@ def get_shapes_with_position( # noqa: C901
):
is_relationship_geom = True
if shape_type_str and (
- "Connector" in shape_type_str or shape_type_str in ("Line", "ConnectLine")
+ "Connector" in shape_type_str
+ or shape_type_str in ("Line", "ConnectLine")
):
is_relationship_geom = True
if shape_name and ("Connector" in shape_name or "Line" in shape_name):
@@ -192,19 +312,54 @@ def get_shapes_with_position( # noqa: C901
excel_name = shape_name if isinstance(shape_name, str) else None
- shape_obj = Shape(
- id=shape_id,
- text=text,
- l=int(shp.left),
- t=int(shp.top),
- w=int(shp.width)
- if mode == "verbose" or shape_type_str == "Group"
- else None,
- h=int(shp.height)
- if mode == "verbose" or shape_type_str == "Group"
- else None,
- type=type_label,
- )
+ shape_obj: Shape | Arrow | SmartArt
+ if has_smartart:
+ smartart_obj: _SmartArtLike | None = None
+ try:
+ smartart_obj = shp.api.SmartArt
+ except Exception:
+ smartart_obj = None
+ shape_obj = SmartArt(
+ id=shape_id,
+ text=text,
+ l=int(shp.left),
+ t=int(shp.top),
+ w=int(shp.width)
+ if mode == "verbose" or shape_type_str == "Group"
+ else None,
+ h=int(shp.height)
+ if mode == "verbose" or shape_type_str == "Group"
+ else None,
+ layout=_get_smartart_layout_name(smartart_obj),
+ nodes=_extract_smartart_nodes(smartart_obj),
+ )
+ elif is_relationship_geom:
+ shape_obj = Arrow(
+ id=shape_id,
+ text=text,
+ l=int(shp.left),
+ t=int(shp.top),
+ w=int(shp.width)
+ if mode == "verbose" or shape_type_str == "Group"
+ else None,
+ h=int(shp.height)
+ if mode == "verbose" or shape_type_str == "Group"
+ else None,
+ )
+ else:
+ shape_obj = Shape(
+ id=shape_id,
+ text=text,
+ l=int(shp.left),
+ t=int(shp.top),
+ w=int(shp.width)
+ if mode == "verbose" or shape_type_str == "Group"
+ else None,
+ h=int(shp.height)
+ if mode == "verbose" or shape_type_str == "Group"
+ else None,
+ type=type_label,
+ )
if excel_name:
if shape_id is not None:
excel_names.append((excel_name, shape_id))
@@ -215,7 +370,8 @@ def get_shapes_with_position( # noqa: C901
angle = compute_line_angle_deg(
float(shp.width), float(shp.height)
)
- shape_obj.direction = angle_to_compass(angle) # type: ignore
+ if isinstance(shape_obj, Arrow):
+ shape_obj.direction = angle_to_compass(angle)
try:
rot = float(shp.api.Rotation)
if abs(rot) > 1e-6:
@@ -225,8 +381,9 @@ def get_shapes_with_position( # noqa: C901
try:
begin_style = int(shp.api.Line.BeginArrowheadStyle)
end_style = int(shp.api.Line.EndArrowheadStyle)
- shape_obj.begin_arrow_style = begin_style
- shape_obj.end_arrow_style = end_style
+ if isinstance(shape_obj, Arrow):
+ shape_obj.begin_arrow_style = begin_style
+ shape_obj.end_arrow_style = end_style
except Exception:
pass
# Connector begin/end connected shapes (if this shape is a connector).
@@ -262,7 +419,8 @@ def get_shapes_with_position( # noqa: C901
pass
except Exception:
pass
- pending_connections.append((shape_obj, begin_name, end_name))
+ if isinstance(shape_obj, Arrow):
+ pending_connections.append((shape_obj, begin_name, end_name))
shapes.append(shape_obj)
if pending_connections:
name_to_id = {name: sid for name, sid in excel_names}
diff --git a/src/exstruct/io/__init__.py b/src/exstruct/io/__init__.py
index c8c1201..e2ad37a 100644
--- a/src/exstruct/io/__init__.py
+++ b/src/exstruct/io/__init__.py
@@ -7,7 +7,16 @@
from ..core.ranges import RangeBounds, parse_range_zero_based
from ..errors import OutputError, SerializationError
-from ..models import CellRow, Chart, PrintArea, PrintAreaView, Shape, WorkbookData
+from ..models import (
+ Arrow,
+ CellRow,
+ Chart,
+ PrintArea,
+ PrintAreaView,
+ Shape,
+ SmartArt,
+ WorkbookData,
+)
from ..models.types import JsonStructure
from .serialize import (
_FORMAT_HINTS,
@@ -34,7 +43,14 @@ def dict_without_empty_values(obj: object) -> JsonStructure:
]
if isinstance(
obj,
- WorkbookData | CellRow | Chart | PrintArea | PrintAreaView | Shape,
+ WorkbookData
+ | CellRow
+ | Chart
+ | PrintArea
+ | PrintAreaView
+ | Shape
+ | Arrow
+ | SmartArt,
):
return dict_without_empty_values(obj.model_dump(exclude_none=True))
return cast(JsonStructure, obj)
@@ -161,9 +177,11 @@ def _rects_overlap(a: tuple[int, int, int, int], b: tuple[int, int, int, int]) -
return not (a[2] <= b[0] or a[0] >= b[2] or a[3] <= b[1] or a[1] >= b[3])
-def _filter_shapes_to_area(shapes: list[Shape], area: PrintArea) -> list[Shape]:
+def _filter_shapes_to_area(
+ shapes: list[Shape | Arrow | SmartArt], area: PrintArea
+) -> list[Shape | Arrow | SmartArt]:
area_rect = _area_to_px_rect(area)
- filtered: list[Shape] = []
+ filtered: list[Shape | Arrow | SmartArt] = []
for shp in shapes:
if shp.w is None or shp.h is None:
# Fallback: treat shape as a point if size is unknown (standard mode).
diff --git a/src/exstruct/models/__init__.py b/src/exstruct/models/__init__.py
index 65cfb30..bd40d0b 100644
--- a/src/exstruct/models/__init__.py
+++ b/src/exstruct/models/__init__.py
@@ -8,8 +8,8 @@
from pydantic import BaseModel, Field
-class Shape(BaseModel):
- """Shape metadata (position, size, text, and styling)."""
+class BaseShape(BaseModel):
+ """Common shape metadata (position, size, text, and styling)."""
id: int | None = Field(
default=None,
@@ -20,10 +20,22 @@ class Shape(BaseModel):
t: int = Field(description="Top offset (Excel units).")
w: int | None = Field(default=None, description="Shape width (None if unknown).")
h: int | None = Field(default=None, description="Shape height (None if unknown).")
- type: str | None = Field(default=None, description="Excel shape type name.")
rotation: float | None = Field(
default=None, description="Rotation angle in degrees."
)
+
+
+class Shape(BaseShape):
+ """Normal shape metadata."""
+
+ kind: Literal["shape"] = Field(default="shape", description="Shape kind.")
+ type: str | None = Field(default=None, description="Excel shape type name.")
+
+
+class Arrow(BaseShape):
+ """Connector shape metadata."""
+
+ kind: Literal["arrow"] = Field(default="arrow", description="Shape kind.")
begin_arrow_style: int | None = Field(
default=None, description="Arrow style enum for the start of a connector."
)
@@ -47,6 +59,23 @@ class Shape(BaseModel):
)
+class SmartArtNode(BaseModel):
+ """Node of SmartArt hierarchy."""
+
+ text: str = Field(description="Visible text for the node.")
+ kids: list[SmartArtNode] = Field(default_factory=list, description="Child nodes.")
+
+
+class SmartArt(BaseShape):
+ """SmartArt shape metadata with nested nodes."""
+
+ kind: Literal["smartart"] = Field(default="smartart", description="Shape kind.")
+ layout: str = Field(description="SmartArt layout name.")
+ nodes: list[SmartArtNode] = Field(
+ default_factory=list, description="Root nodes of SmartArt tree."
+ )
+
+
class CellRow(BaseModel):
"""A single row of cells with optional hyperlinks."""
@@ -109,7 +138,7 @@ class SheetData(BaseModel):
rows: list[CellRow] = Field(
default_factory=list, description="Extracted rows with cell values and links."
)
- shapes: list[Shape] = Field(
+ shapes: list[Shape | Arrow | SmartArt] = Field(
default_factory=list, description="Shapes detected on the sheet."
)
charts: list[Chart] = Field(
@@ -267,7 +296,7 @@ class PrintAreaView(BaseModel):
book_name: str = Field(description="Workbook name owning the area.")
sheet_name: str = Field(description="Sheet name owning the area.")
area: PrintArea = Field(description="Print area bounds.")
- shapes: list[Shape] = Field(
+ shapes: list[Shape | Arrow | SmartArt] = Field(
default_factory=list, description="Shapes overlapping the area."
)
charts: list[Chart] = Field(
diff --git a/tests/com/test_shapes_extraction.py b/tests/com/test_shapes_extraction.py
index 47347e2..be31e2d 100644
--- a/tests/com/test_shapes_extraction.py
+++ b/tests/com/test_shapes_extraction.py
@@ -4,6 +4,7 @@
import xlwings as xw
from exstruct.core.integrate import extract_workbook
+from exstruct.models import Arrow, Shape
pytestmark = pytest.mark.com
@@ -70,28 +71,22 @@ def test_図形の種別とテキストが抽出される(tmp_path: Path) -> Non
wb_data = extract_workbook(path)
shapes = wb_data.sheets["Sheet1"].shapes
- rect = next(s for s in shapes if s.text == "rect")
+ rect = next(s for s in shapes if isinstance(s, Shape) and s.text == "rect")
assert "AutoShape" in (rect.type or "")
assert rect.l >= 0 and rect.t >= 0
- assert rect.id > 0
+ assert rect.id is not None and rect.id > 0
- inner = next(s for s in shapes if s.text == "inner")
+ inner = next(s for s in shapes if isinstance(s, Shape) and s.text == "inner")
assert "Group" not in (inner.type or "") # flattened child
- assert not any((s.type or "") == "Group" for s in shapes)
- assert inner.id > 0
+ assert not any(isinstance(s, Shape) and (s.type or "") == "Group" for s in shapes)
+ assert inner.id is not None and inner.id > 0
ids = [s.id for s in shapes if s.id is not None]
assert len(ids) == len(set(ids))
# Standard mode should not emit non-relationship AutoShapes without text.
assert not any(
- (s.text == "" or s.text is None)
+ isinstance(s, Shape)
+ and (s.text == "" or s.text is None)
and (s.type or "").startswith("AutoShape")
- and not (
- s.direction
- or s.begin_arrow_style is not None
- or s.end_arrow_style is not None
- or s.begin_id is not None
- or s.end_id is not None
- )
for s in shapes
)
@@ -107,7 +102,8 @@ def test_線図形の方向と矢印情報が抽出される(tmp_path: Path) ->
line = next(
s
for s in shapes
- if s.begin_arrow_style is not None or s.end_arrow_style is not None
+ if isinstance(s, Arrow)
+ and (s.begin_arrow_style is not None or s.end_arrow_style is not None)
)
assert line.direction == "E"
@@ -121,20 +117,20 @@ def test_コネクターの接続元と接続先が抽出される(tmp_path: Pat
shapes = wb_data.sheets["Sheet1"].shapes
connectors = [
- s
- for s in shapes
- if s.begin_id is not None or s.end_id is not None
+ s for s in shapes if isinstance(s, Arrow) and (s.begin_id or s.end_id)
]
# If the environment could not wire connectors, simply skip the assertion.
if not connectors:
- pytest.skip("Excel failed to populate ConnectorFormat.ConnectedShape properties.")
+ pytest.skip(
+ "Excel failed to populate ConnectorFormat.ConnectedShape properties."
+ )
conn = connectors[0]
assert conn.begin_id is not None
assert conn.end_id is not None
assert conn.begin_id != conn.end_id
# Connected shape ids should correspond to some emitted shapes' id.
- shape_ids = {s.id for s in shapes}
+ shape_ids = {s.id for s in shapes if s.id is not None}
assert conn.begin_id in shape_ids
assert conn.end_id in shape_ids
diff --git a/tests/core/test_mode_output.py b/tests/core/test_mode_output.py
index 4d900f8..06e5930 100644
--- a/tests/core/test_mode_output.py
+++ b/tests/core/test_mode_output.py
@@ -10,6 +10,7 @@
import xlwings as xw
from exstruct import extract, process_excel
+from exstruct.models import Arrow
def _make_basic_book(path: Path) -> None:
@@ -78,8 +79,8 @@ def test_standardモードはテキストなし図形を除外する(tmp_path: P
for s in shapes:
if s.text != "":
continue
- assert s.type is not None
- assert ("Line" in s.type) or ("Connector" in s.type) or ("Arrow" in s.type)
+ assert isinstance(s, Arrow)
+ assert s.direction is not None or s.begin_arrow_style is not None
def test_verboseモードでは全図形と幅高さが出力される(tmp_path: Path) -> None:
@@ -108,11 +109,11 @@ def test_invalidモードはエラーになる(tmp_path: Path) -> None:
path = tmp_path / "book.xlsx"
_make_basic_book(path)
with pytest.raises(ValueError):
- extract(path, mode="invalid")
+ extract(path, mode="invalid") # type: ignore[arg-type]
out = tmp_path / "out.json"
with pytest.raises(ValueError):
- process_excel(path, out, mode="invalid")
+ process_excel(path, out, mode="invalid") # type: ignore[arg-type]
def test_CLIのmode引数バリデーション(tmp_path: Path) -> None:
diff --git a/tests/core/test_shapes_positions_dummy.py b/tests/core/test_shapes_positions_dummy.py
index 13e228f..999e70b 100644
--- a/tests/core/test_shapes_positions_dummy.py
+++ b/tests/core/test_shapes_positions_dummy.py
@@ -1,6 +1,7 @@
from dataclasses import dataclass
from exstruct.core.shapes import get_shapes_with_position
+from exstruct.models import Arrow
@dataclass(frozen=True)
@@ -45,6 +46,27 @@ def Rotation(self) -> float:
return self.rotation
+@dataclass(frozen=True)
+class _DummyApiSmartArt:
+ shape_type: int
+
+ @property
+ def Type(self) -> int:
+ return self.shape_type
+
+ @property
+ def AutoShapeType(self) -> int:
+ raise RuntimeError("AutoShapeType unavailable")
+
+ @property
+ def HasSmartArt(self) -> bool:
+ return True
+
+ @property
+ def SmartArt(self) -> object:
+ return object()
+
+
@dataclass(frozen=True)
class _DummyShape:
name: str
@@ -53,7 +75,7 @@ class _DummyShape:
top: float
width: float
height: float
- api: _DummyApi
+ api: object
@dataclass(frozen=True)
@@ -107,7 +129,8 @@ def test_get_shapes_with_position_standard_filters_textless_non_relation() -> No
assert len(shapes) == 2
assert {s.text for s in shapes} == {"Hello", ""}
line_entries = [s for s in shapes if s.text == ""]
- assert line_entries[0].type == "Line"
+ assert isinstance(line_entries[0], Arrow)
+ assert line_entries[0].direction == "E"
text_entries = [s for s in shapes if s.text == "Hello"]
assert text_entries[0].id == 1
@@ -151,3 +174,19 @@ def test_get_shapes_with_position_verbose_includes_all_and_sizes() -> None:
assert len(shapes) == 3
assert all(s.w is not None and s.h is not None for s in shapes)
+
+
+def test_get_shapes_with_position_light_skips_smartart() -> None:
+ smartart_shape = _DummyShape(
+ name="SmartArt1",
+ text="sa",
+ left=10.0,
+ top=20.0,
+ width=100.0,
+ height=50.0,
+ api=_DummyApiSmartArt(shape_type=24),
+ )
+ book = _DummyBook(sheets=[_DummySheet(name="Sheet1", shapes=[smartart_shape])])
+
+ result = get_shapes_with_position(book, mode="light")
+ assert result["Sheet1"] == []
diff --git a/tests/core/test_shapes_smartart_utils.py b/tests/core/test_shapes_smartart_utils.py
new file mode 100644
index 0000000..e8c49b0
--- /dev/null
+++ b/tests/core/test_shapes_smartart_utils.py
@@ -0,0 +1,147 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import cast
+
+import xlwings as xw
+
+from exstruct.core import shapes as shapes_mod
+
+
+@dataclass
+class _DummyTextRange:
+ Text: str | None # noqa: N815
+
+
+@dataclass
+class _DummyTextFrame:
+ HasText: bool # noqa: N815
+ TextRange: _DummyTextRange # noqa: N815
+
+
+@dataclass
+class _DummyNode:
+ Level: int # noqa: N815
+ TextFrame2: _DummyTextFrame # noqa: N815
+
+
+@dataclass
+class _DummyLayout:
+ Name: str | None # noqa: N815
+
+
+@dataclass
+class _DummySmartArt:
+ AllNodes: list[_DummyNode] # noqa: N815
+ Layout: object # noqa: N815
+
+
+@dataclass(frozen=True)
+class _DummyApi:
+ HasSmartArt: bool # noqa: N815
+ SmartArt: _DummySmartArt | None # noqa: N815
+
+
+@dataclass(frozen=True)
+class _DummyApiRaises:
+ @property
+ def HasSmartArt(self) -> bool: # noqa: N802
+ raise RuntimeError("HasSmartArt unavailable")
+
+
+@dataclass(frozen=True)
+class _DummyShape:
+ api_obj: object
+
+ @property
+ def api(self) -> object:
+ return self.api_obj
+
+
+@dataclass(frozen=True)
+class _DummyShapeRaisesApi:
+ @property
+ def api(self) -> object:
+ raise RuntimeError("api unavailable")
+
+
+def test_shape_has_smartart_true_false() -> None:
+ smartart = _DummySmartArt(AllNodes=[], Layout=_DummyLayout(Name="L"))
+ has = shapes_mod._shape_has_smartart(
+ cast(
+ xw.Shape,
+ _DummyShape(api_obj=_DummyApi(HasSmartArt=True, SmartArt=smartart)),
+ )
+ )
+ assert has is True
+
+ has_false = shapes_mod._shape_has_smartart(
+ cast(xw.Shape, _DummyShape(api_obj=_DummyApi(HasSmartArt=False, SmartArt=None)))
+ )
+ assert has_false is False
+
+
+def test_shape_has_smartart_handles_exceptions() -> None:
+ has = shapes_mod._shape_has_smartart(
+ cast(xw.Shape, _DummyShape(api_obj=_DummyApiRaises()))
+ )
+ assert has is False
+
+ has_api_error = shapes_mod._shape_has_smartart(
+ cast(xw.Shape, _DummyShapeRaisesApi())
+ )
+ assert has_api_error is False
+
+
+def test_get_smartart_layout_name() -> None:
+ assert shapes_mod._get_smartart_layout_name(None) == "Unknown"
+ smartart = _DummySmartArt(AllNodes=[], Layout=_DummyLayout(Name="Layout"))
+ assert (
+ shapes_mod._get_smartart_layout_name(cast(shapes_mod._SmartArtLike, smartart))
+ == "Layout"
+ )
+ smartart_no_name = _DummySmartArt(AllNodes=[], Layout=_DummyLayout(Name=None))
+ assert (
+ shapes_mod._get_smartart_layout_name(
+ cast(shapes_mod._SmartArtLike, smartart_no_name)
+ )
+ == "Unknown"
+ )
+
+
+def test_collect_smartart_node_info_and_tree() -> None:
+ nodes = [
+ _DummyNode(
+ Level=1,
+ TextFrame2=_DummyTextFrame(
+ HasText=True, TextRange=_DummyTextRange(Text="root")
+ ),
+ ),
+ _DummyNode(
+ Level=2,
+ TextFrame2=_DummyTextFrame(
+ HasText=True, TextRange=_DummyTextRange(Text="child")
+ ),
+ ),
+ _DummyNode(
+ Level=1,
+ TextFrame2=_DummyTextFrame(
+ HasText=False, TextRange=_DummyTextRange(Text=None)
+ ),
+ ),
+ ]
+ smartart = _DummySmartArt(AllNodes=nodes, Layout=_DummyLayout(Name="L"))
+ info = shapes_mod._collect_smartart_node_info(
+ cast(shapes_mod._SmartArtLike, smartart)
+ )
+ assert info == [(1, "root"), (2, "child"), (1, "")]
+
+ roots = shapes_mod._extract_smartart_nodes(cast(shapes_mod._SmartArtLike, smartart))
+ assert len(roots) == 2
+ assert roots[0].text == "root"
+ assert roots[0].kids[0].text == "child"
+ assert roots[1].text == ""
+
+
+def test_collect_smartart_node_info_none() -> None:
+ assert shapes_mod._collect_smartart_node_info(None) == []
diff --git a/tests/io/test_print_area_views.py b/tests/io/test_print_area_views.py
index a7c1a1e..8e61e90 100644
--- a/tests/io/test_print_area_views.py
+++ b/tests/io/test_print_area_views.py
@@ -2,12 +2,33 @@
from pathlib import Path
from exstruct.io import save_print_area_views
-from exstruct.models import CellRow, Chart, PrintArea, Shape, SheetData, WorkbookData
+from exstruct.models import (
+ Arrow,
+ CellRow,
+ Chart,
+ PrintArea,
+ Shape,
+ SheetData,
+ SmartArt,
+ SmartArtNode,
+ WorkbookData,
+)
def _workbook_with_print_area() -> WorkbookData:
shape_inside = Shape(id=1, text="inside", l=10, t=5, w=20, h=10, type="Rect")
shape_outside = Shape(id=2, text="outside", l=200, t=200, w=30, h=30, type="Rect")
+ smartart_inside = SmartArt(
+ id=3,
+ text="sa",
+ l=15,
+ t=8,
+ w=20,
+ h=10,
+ layout="Layout",
+ nodes=[SmartArtNode(text="root", kids=[])],
+ )
+ arrow_inside = Arrow(id=None, text="", l=5, t=5, w=20, h=2)
chart_inside = Chart(
name="c1",
chart_type="Line",
@@ -40,7 +61,7 @@ def _workbook_with_print_area() -> WorkbookData:
CellRow(r=2, c={"1": "B"}),
CellRow(r=3, c={"1": "C"}),
],
- shapes=[shape_inside, shape_outside],
+ shapes=[shape_inside, smartart_inside, arrow_inside, shape_outside],
charts=[chart_inside, chart_outside],
table_candidates=["A1:B2", "C1:C1"],
print_areas=[PrintArea(r1=1, c1=0, r2=2, c2=1)],
@@ -61,7 +82,8 @@ def test_save_print_area_views_filters_rows_and_tables(tmp_path: Path) -> None:
# Only table candidates fully contained in the print area remain.
assert data["table_candidates"] == ["A1:B2"]
# Shapes/Charts filtered by overlap; outside or size-less charts are dropped.
- assert len(data["shapes"]) == 1 and data["shapes"][0]["text"] == "inside"
+ kinds = {shape["kind"] for shape in data["shapes"]}
+ assert kinds == {"shape", "smartart", "arrow"}
assert len(data["charts"]) == 1 and data["charts"][0]["name"] == "c1"
diff --git a/tests/models/test_models_export.py b/tests/models/test_models_export.py
index 38080dd..9d110e5 100644
--- a/tests/models/test_models_export.py
+++ b/tests/models/test_models_export.py
@@ -1,10 +1,11 @@
from importlib import util
+import json
from pathlib import Path
import pytest
from exstruct.errors import MissingDependencyError
-from exstruct.models import CellRow, SheetData, WorkbookData
+from exstruct.models import CellRow, SheetData, SmartArt, SmartArtNode, WorkbookData
HAS_PYYAML = util.find_spec("yaml") is not None
HAS_TOON = util.find_spec("toon") is not None
@@ -95,3 +96,31 @@ def test_workbook_iter_and_getitem() -> None:
assert pairs[0][1] is first
with pytest.raises(KeyError):
_ = wb["Nope"]
+
+
+def test_sheet_json_includes_smartart_nodes() -> None:
+ smartart = SmartArt(
+ id=1,
+ text="sa",
+ l=0,
+ t=0,
+ w=10,
+ h=10,
+ layout="Layout",
+ nodes=[
+ SmartArtNode(
+ text="root",
+ kids=[SmartArtNode(text="child", kids=[])],
+ )
+ ],
+ )
+ sheet = SheetData(
+ rows=[],
+ shapes=[smartart],
+ charts=[],
+ table_candidates=[],
+ )
+ data = json.loads(sheet.to_json())
+ assert data["shapes"][0]["kind"] == "smartart"
+ assert data["shapes"][0]["nodes"][0]["text"] == "root"
+ assert data["shapes"][0]["nodes"][0]["kids"][0]["text"] == "child"
diff --git a/tests/models/test_models_validation.py b/tests/models/test_models_validation.py
index 1b57bf7..3bb45dd 100644
--- a/tests/models/test_models_validation.py
+++ b/tests/models/test_models_validation.py
@@ -2,11 +2,14 @@
import pytest
from exstruct.models import (
+ Arrow,
CellRow,
Chart,
ChartSeries,
Shape,
SheetData,
+ SmartArt,
+ SmartArtNode,
WorkbookData,
)
@@ -14,7 +17,25 @@
def test_モデルのデフォルトとオプション値() -> None:
shape = Shape(id=1, text="t", l=1, t=2, w=None, h=None)
assert shape.rotation is None
- assert shape.direction is None
+ assert shape.kind == "shape"
+
+ arrow = Arrow(id=None, text="a", l=1, t=1, w=10, h=1)
+ assert arrow.begin_arrow_style is None
+ assert arrow.end_arrow_style is None
+ assert arrow.kind == "arrow"
+
+ smartart = SmartArt(
+ id=3,
+ text="sa",
+ l=5,
+ t=6,
+ w=50,
+ h=40,
+ layout="Layout",
+ nodes=[SmartArtNode(text="root", kids=[])],
+ )
+ assert smartart.layout == "Layout"
+ assert smartart.nodes[0].text == "root"
cell = CellRow(r=1, c={"0": "v"})
assert cell.c["0"] == "v"
@@ -48,7 +69,7 @@ def test_モデルのデフォルトとオプション値() -> None:
def test_directionのリテラル検証() -> None:
with pytest.raises(ValidationError):
- Shape(id=1, text="bad", l=0, t=0, w=None, h=None, direction="X")
+ Arrow(id=1, text="bad", l=0, t=0, w=None, h=None, direction="X")
def test_cellrowの数値正規化() -> None:
@@ -56,3 +77,21 @@ def test_cellrowの数値正規化() -> None:
assert isinstance(cell.c["0"], int)
assert isinstance(cell.c["1"], float)
assert cell.c["2"] == "text"
+
+
+def test_arrow_only_fields_are_not_on_shape() -> None:
+ arrow = Arrow(
+ id=None,
+ text="a",
+ l=1,
+ t=1,
+ w=10,
+ h=2,
+ begin_id=1,
+ end_id=2,
+ )
+ shape = Shape(id=1, text="s", l=0, t=0, w=None, h=None)
+ assert arrow.begin_id == 1
+ assert arrow.end_id == 2
+ assert not hasattr(shape, "begin_id")
+ assert not hasattr(shape, "end_id")