Skip to content

Commit 48269d9

Browse files
committed
Add DocxodusEngine as a second comparison engine
Introduce Docxodus (a modernized .NET 8.0 fork of Open-XML-PowerTools with better move detection) as an alternative engine alongside XmlPowerToolsEngine. - Extract BaseEngine class with shared binary extraction and subprocess logic - XmlPowerToolsEngine and DocxodusEngine are thin subclasses setting 3 constants - Add Docxodus as a git submodule at docxodus/ - Refactor build_differ.py into reusable build_engine() function (also fixes missing win-arm64 compression) - Update CI workflow for submodules and .NET SDK - Add integration tests and parametrized contract tests for both engines
1 parent 254e742 commit 48269d9

File tree

13 files changed

+253
-59
lines changed

13 files changed

+253
-59
lines changed

.github/workflows/python-publish.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,16 @@ jobs:
1414

1515
steps:
1616
- uses: actions/checkout@v3
17+
with:
18+
submodules: recursive
1719
- name: Set up Python
1820
uses: actions/setup-python@v3
1921
with:
2022
python-version: '3.x'
23+
- name: Setup .NET
24+
uses: actions/setup-dotnet@v3
25+
with:
26+
dotnet-version: '8.0.x'
2127
- name: Install dependencies
2228
run: |
2329
python -m pip install --upgrade pip
@@ -26,4 +32,4 @@ jobs:
2632
run: hatch build
2733
- name: Publish package
2834
run: |
29-
hatch publish -u "__token__" -a ${{ secrets.PYPI_API_TOKEN }}
35+
hatch publish -u "__token__" -a ${{ secrets.PYPI_API_TOKEN }}

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ __pycache__/
66
# C# Build Dirs
77
csproj/bin/*
88
csproj/obj/*
9+
docxodus/**/bin/*
10+
docxodus/**/obj/*
911

1012
# C extensions
1113
*.so

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "docxodus"]
2+
path = docxodus
3+
url = https://github.com/JSv4/Docxodus.git

CLAUDE.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Python-Redlines is a Python wrapper around compiled C# binaries that generate `.docx` redline/tracked-changes documents by comparing two Word files. The Python layer handles platform detection, binary extraction, temp file management, and subprocess execution.
8+
9+
Two comparison engines are available:
10+
- **XmlPowerToolsEngine** — wraps Open-XML-PowerTools WmlComparer (original engine)
11+
- **DocxodusEngine** — wraps Docxodus, a modernized .NET 8.0 fork with better move detection
12+
13+
## Commands
14+
15+
```bash
16+
# Run tests
17+
hatch run test
18+
19+
# Run a single test
20+
hatch run test tests/test_openxml_differ.py::test_run_redlines_with_real_files
21+
22+
# Run tests with coverage
23+
hatch run cov
24+
25+
# Type checking
26+
hatch run types:check
27+
28+
# Build C# binaries for all platforms (requires .NET 8.0 SDK)
29+
hatch run build
30+
31+
# Build Python package (triggers C# build via custom hook)
32+
hatch build
33+
34+
# Initialize Docxodus submodule (required before building)
35+
git submodule update --init --recursive
36+
```
37+
38+
## Architecture
39+
40+
The system uses a two-layer wrapper pattern with a shared base class:
41+
42+
1. **Python layer** (`src/python_redlines/engines.py`):
43+
- `BaseEngine` — shared logic for binary extraction, subprocess invocation, and temp file management
44+
- `XmlPowerToolsEngine(BaseEngine)` — sets constants for the Open-XML-PowerTools binary (`dist/`, `bin/`, `redlines`)
45+
- `DocxodusEngine(BaseEngine)` — sets constants for the Docxodus binary (`dist_docxodus/`, `bin_docxodus/`, `redline`)
46+
47+
Both engines share the same CLI argument format: `<author_tag> <original.docx> <modified.docx> <output.docx>`
48+
49+
2. **C# binaries**:
50+
- `csproj/Program.cs` — Open-XML-PowerTools CLI tool
51+
- `docxodus/tools/redline/Program.cs` — Docxodus CLI tool (git submodule)
52+
53+
Pre-compiled binaries for 6 platform targets (linux/win/osx x x64/arm64) are stored as archives in `src/python_redlines/dist/` and `src/python_redlines/dist_docxodus/`, included in the wheel. The build script `build_differ.py` compiles both engines using `dotnet publish`.
54+
55+
## Key Files
56+
57+
- `src/python_redlines/engines.py` — BaseEngine, XmlPowerToolsEngine, and DocxodusEngine classes
58+
- `src/python_redlines/__init__.py` — Exports all engine classes
59+
- `src/python_redlines/__about__.py` — Single source of truth for package version
60+
- `csproj/Program.cs` — Open-XML-PowerTools C# comparison utility
61+
- `docxodus/` — Docxodus git submodule (tools/redline/ contains the CLI)
62+
- `build_differ.py` — Cross-platform C# build orchestration for both engines
63+
- `hatch_run_build_hook.py` — Hatch build hook that triggers C# compilation
64+
- `tests/fixtures/` — Test `.docx` files (original, modified, expected_redline)
65+
66+
## Testing Notes
67+
68+
Tests must be run from the project root (fixtures use relative paths like `tests/fixtures/original.docx`). The XmlPowerToolsEngine integration test validates that comparing the fixture documents produces exactly 9 revisions. Docxodus uses a different stdout format (`"revision(s) found"` vs `"Revisions found: 9"`).
69+
70+
## Stdout Format Differences
71+
72+
- **XmlPowerToolsEngine**: `"Revisions found: 9"`
73+
- **DocxodusEngine**: `"Redline complete: 9 revision(s) found"`

build_differ.py

Lines changed: 45 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@
44
import zipfile
55

66

7+
RIDS = [
8+
("linux-x64", ".tar.gz"),
9+
("linux-arm64", ".tar.gz"),
10+
("win-x64", ".zip"),
11+
("win-arm64", ".zip"),
12+
("osx-x64", ".tar.gz"),
13+
("osx-arm64", ".tar.gz"),
14+
]
15+
16+
717
def get_version():
818
"""
919
Extracts the version from the specified __about__.py file.
@@ -50,57 +60,47 @@ def cleanup_old_builds(dist_dir, current_version):
5060
print(f"Deleted old build file: {file}")
5161

5262

53-
def main():
54-
version = get_version()
55-
print(f"Version: {version}")
56-
57-
dist_dir = "./src/python_redlines/dist/"
58-
59-
# Build for Linux x64
60-
print("Building for Linux x64...")
61-
run_command('dotnet publish ./csproj -c Release -r linux-x64 --self-contained')
62-
63-
# Build for Linux ARM64
64-
print("Building for Linux ARM64...")
65-
run_command('dotnet publish ./csproj -c Release -r linux-arm64 --self-contained')
66-
67-
# Build for Windows x64
68-
print("Building for Windows x64...")
69-
run_command('dotnet publish ./csproj -c Release -r win-x64 --self-contained')
70-
71-
# Build for Windows ARM64
72-
print("Building for Windows ARM64...")
73-
run_command('dotnet publish ./csproj -c Release -r win-arm64 --self-contained')
74-
75-
# Build for macOS x64
76-
print("Building for macOS x64...")
77-
run_command('dotnet publish ./csproj -c Release -r osx-x64 --self-contained')
78-
79-
# Build for macOS ARM64
80-
print("Building for macOS ARM64...")
81-
run_command('dotnet publish ./csproj -c Release -r osx-arm64 --self-contained')
63+
def build_engine(csproj_path, dist_dir, version):
64+
"""
65+
Builds a C# engine for all platform targets, compresses the output, and cleans up old builds.
8266
83-
# Compress the Linux x64 build
84-
linux_x64_build_dir = './csproj/bin/Release/net8.0/linux-x64'
85-
compress_files(linux_x64_build_dir, f"{dist_dir}/linux-x64-{version}.tar.gz")
67+
:param csproj_path: Path to the .csproj directory (e.g. './csproj' or './docxodus/tools/redline')
68+
:param dist_dir: Path to the distribution directory for compressed binaries
69+
:param version: Version string for archive naming
70+
"""
71+
# Build for each RID
72+
for rid, _ in RIDS:
73+
print(f"Building {csproj_path} for {rid}...")
74+
run_command(f'dotnet publish {csproj_path} -c Release -r {rid} --self-contained')
75+
76+
# Determine the build output base directory
77+
# dotnet publish outputs to <csproj_path>/bin/Release/net8.0/<rid>
78+
build_base = os.path.join(csproj_path, 'bin', 'Release', 'net8.0')
79+
80+
# Compress each build
81+
for rid, ext in RIDS:
82+
build_dir = os.path.join(build_base, rid)
83+
archive_path = os.path.join(dist_dir, f"{rid}-{version}{ext}")
84+
print(f"Compressing {rid} to {archive_path}...")
85+
compress_files(build_dir, archive_path)
8686

87-
# Compress the Linux ARM64 build
88-
linux_arm64_build_dir = './csproj/bin/Release/net8.0/linux-arm64'
89-
compress_files(linux_arm64_build_dir, f"{dist_dir}/linux-arm64-{version}.tar.gz")
87+
cleanup_old_builds(dist_dir, version)
9088

91-
# Compress the Windows x64 build
92-
windows_build_dir = './csproj/bin/Release/net8.0/win-x64'
93-
compress_files(windows_build_dir, f"{dist_dir}/win-x64-{version}.zip")
9489

95-
# Compress the macOS x64 build
96-
macos_x64_build_dir = './csproj/bin/Release/net8.0/osx-x64'
97-
compress_files(macos_x64_build_dir, f"{dist_dir}/osx-x64-{version}.tar.gz")
90+
def main():
91+
version = get_version()
92+
print(f"Version: {version}")
9893

99-
# Compress the macOS ARM64 build
100-
macos_arm64_build_dir = './csproj/bin/Release/net8.0/osx-arm64'
101-
compress_files(macos_arm64_build_dir, f"{dist_dir}/osx-arm64-{version}.tar.gz")
94+
# Build the XmlPowerTools engine (original)
95+
build_engine('./csproj', './src/python_redlines/dist/', version)
10296

103-
cleanup_old_builds(dist_dir, version)
97+
# Build the Docxodus engine (if submodule is available)
98+
docxodus_csproj = './docxodus/tools/redline'
99+
if os.path.exists(os.path.join(docxodus_csproj, 'redline.csproj')):
100+
build_engine(docxodus_csproj, './src/python_redlines/dist_docxodus/', version)
101+
else:
102+
print("WARNING: Docxodus submodule not found at docxodus/tools/redline/redline.csproj — skipping Docxodus build.")
103+
print("Run 'git submodule update --init --recursive' to initialize the submodule.")
104104

105105
print("Build and compression complete.")
106106

docxodus

Submodule docxodus added at 0a0b8c6

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ artifacts = [
1313
[tool.hatch.build.targets.sdist]
1414
include = [
1515
"python_redlines/dist",
16+
"python_redlines/dist_docxodus",
1617
"python_redlines/bin",
18+
"python_redlines/bin_docxodus",
1719
]
1820

1921
# Build hook to build the binaries for distribution...

src/python_redlines/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
# SPDX-FileCopyrightText: 2024-present U.N. Owen <void@some.where>
22
#
33
# SPDX-License-Identifier: MIT
4+
5+
from .engines import XmlPowerToolsEngine, DocxodusEngine, BaseEngine
6+
7+
__all__ = ["XmlPowerToolsEngine", "DocxodusEngine", "BaseEngine"]
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*
2+
!.gitignore
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*
2+
!.gitignore

0 commit comments

Comments
 (0)