Skip to content

Commit 06d3ec6

Browse files
committed
Expose Docxodus comparison settings via Python kwargs
Thread WmlComparerSettings options from Python kwargs through CLI flags to the Docxodus C# binary. Supports detail_threshold, case_insensitive, detect_moves, simplify_move_markup, move_similarity_threshold, move_minimum_word_count, detect_format_changes, conflate_spaces, and date_time. - Extract _build_command() in BaseEngine, override in DocxodusEngine - Add input validation for thresholds and word count - Update Docxodus CLI to parse --flags (backward compat with legacy format) - Rebuild all platform binaries with new flag support - Add 13 new tests (integration, validation, unit) - Update README with Comparison Settings section
1 parent 0cbc2fa commit 06d3ec6

File tree

5 files changed

+268
-4
lines changed

5 files changed

+268
-4
lines changed

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ The system uses a two-layer wrapper pattern with a shared base class:
4444
- `XmlPowerToolsEngine(BaseEngine)` — sets constants for the Open-XML-PowerTools binary (`dist/`, `bin/`, `redlines`)
4545
- `DocxodusEngine(BaseEngine)` — sets constants for the Docxodus binary (`dist_docxodus/`, `bin_docxodus/`, `redline`)
4646

47-
Both engines share the same CLI argument format: `<author_tag> <original.docx> <modified.docx> <output.docx>`
47+
Both engines expose `run_redline(author_tag, original, modified, **kwargs)`. `DocxodusEngine` overrides `_build_command()` to translate kwargs (e.g. `detect_moves`, `detail_threshold`) into CLI flags for the Docxodus binary. `XmlPowerToolsEngine` uses the legacy 4-positional-arg format and ignores kwargs.
4848

4949
2. **C# binaries**:
5050
- `csproj/Program.cs` — Open-XML-PowerTools CLI tool

README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,42 @@ with open("redline.docx", "wb") as f:
9292
print(stdout) # e.g. "Redline complete: 9 revision(s) found"
9393
```
9494

95+
## Comparison Settings (DocxodusEngine only)
96+
97+
`DocxodusEngine` supports fine-grained control over the comparison via keyword arguments to `run_redline()`:
98+
99+
```python
100+
from python_redlines import DocxodusEngine
101+
102+
engine = DocxodusEngine()
103+
redline_bytes, stdout, stderr = engine.run_redline(
104+
"Reviewer", original, modified,
105+
detect_moves=True,
106+
simplify_move_markup=True,
107+
detail_threshold=0.3,
108+
case_insensitive=True,
109+
)
110+
```
111+
112+
| Setting | Type | Default | Description |
113+
|---|---|---|---|
114+
| `detail_threshold` | float | 0.0 | Comparison granularity (0.0–1.0, lower = more detailed) |
115+
| `case_insensitive` | bool | False | Ignore case differences |
116+
| `detect_moves` | bool | False | Enable move detection |
117+
| `simplify_move_markup` | bool | False | Convert moves to del/ins for Word compatibility |
118+
| `move_similarity_threshold` | float | 0.8 | Jaccard threshold for move matching (0.0–1.0) |
119+
| `move_minimum_word_count` | int | 3 | Minimum words for move detection |
120+
| `detect_format_changes` | bool | True | Detect formatting-only changes |
121+
| `conflate_spaces` | bool | True | Treat breaking/non-breaking spaces the same |
122+
| `date_time` | str | now | Custom ISO 8601 timestamp for revisions |
123+
124+
> **Warning:** Move detection can cause Word to display "unreadable content" warnings due to a known
125+
> ID collision bug. When using `detect_moves=True`, always set `simplify_move_markup=True` as well.
126+
> This converts move markup to regular del/ins (loses green move styling but ensures Word compatibility).
127+
128+
> **Note:** These settings are only available on `DocxodusEngine`. `XmlPowerToolsEngine` ignores
129+
> extra keyword arguments.
130+
95131
## Architecture Overview
96132

97133
Both engines follow the same pattern: a Python wrapper class invokes a self-contained C# binary via subprocess.

docxodus

src/python_redlines/engines.py

Lines changed: 72 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,23 @@ def _get_binaries_info(self):
100100

101101
return binary_name, zip_name
102102

103-
def run_redline(self, author_tag: str, original: Union[bytes, Path], modified: Union[bytes, Path]) \
103+
def _build_command(self, author_tag: str, original_path, modified_path, target_path, **kwargs):
104+
"""
105+
Build the command list for subprocess execution.
106+
Subclasses can override to customize argument format.
107+
"""
108+
return [self.extracted_binaries_path, author_tag, original_path, modified_path, target_path]
109+
110+
def run_redline(self, author_tag: str, original: Union[bytes, Path], modified: Union[bytes, Path], **kwargs) \
104111
-> Tuple[bytes, Optional[str], Optional[str]]:
105112
"""
106113
Runs the redline binary. The 'original' and 'modified' arguments can be either bytes or file paths.
107114
Returns the redline output as bytes.
115+
116+
Additional keyword arguments are passed to _build_command() for engine-specific options.
117+
DocxodusEngine supports: detail_threshold, case_insensitive, detect_moves,
118+
simplify_move_markup, move_similarity_threshold, move_minimum_word_count,
119+
detect_format_changes, conflate_spaces, date_time.
108120
"""
109121
temp_files = []
110122
try:
@@ -114,7 +126,7 @@ def run_redline(self, author_tag: str, original: Union[bytes, Path], modified: U
114126
modified_path = self._write_to_temp_file(modified) if isinstance(modified, bytes) else modified
115127
temp_files.extend([target_path, original_path, modified_path])
116128

117-
command = [self.extracted_binaries_path, author_tag, original_path, modified_path, target_path]
129+
command = self._build_command(author_tag, original_path, modified_path, target_path, **kwargs)
118130

119131
# Capture stdout and stderr
120132
result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
@@ -156,3 +168,61 @@ class DocxodusEngine(BaseEngine):
156168
DIST_DIR_NAME = 'dist_docxodus'
157169
BIN_DIR_NAME = 'bin_docxodus'
158170
BINARY_BASE_NAME = 'redline'
171+
172+
# Boolean flags (default False — presence enables)
173+
_BOOL_FLAGS = [
174+
('case_insensitive', '--case-insensitive'),
175+
('detect_moves', '--detect-moves'),
176+
('simplify_move_markup', '--simplify-move-markup'),
177+
]
178+
179+
# Negatable flags (default True — --no- prefix disables)
180+
_NEG_FLAGS = [
181+
('detect_format_changes', '--no-detect-format-changes'),
182+
('conflate_spaces', '--no-conflate-spaces'),
183+
]
184+
185+
# Value flags
186+
_VALUE_FLAGS = [
187+
('detail_threshold', '--detail-threshold'),
188+
('move_similarity_threshold', '--move-similarity-threshold'),
189+
('move_minimum_word_count', '--move-minimum-word-count'),
190+
('date_time', '--date-time'),
191+
]
192+
193+
@staticmethod
194+
def _validate_kwargs(kwargs):
195+
if 'detail_threshold' in kwargs:
196+
val = kwargs['detail_threshold']
197+
if not isinstance(val, (int, float)) or val < 0.0 or val > 1.0:
198+
raise ValueError(f"detail_threshold must be a float between 0.0 and 1.0, got {val!r}")
199+
200+
if 'move_similarity_threshold' in kwargs:
201+
val = kwargs['move_similarity_threshold']
202+
if not isinstance(val, (int, float)) or val < 0.0 or val > 1.0:
203+
raise ValueError(f"move_similarity_threshold must be a float between 0.0 and 1.0, got {val!r}")
204+
205+
if 'move_minimum_word_count' in kwargs:
206+
val = kwargs['move_minimum_word_count']
207+
if not isinstance(val, int) or val < 1:
208+
raise ValueError(f"move_minimum_word_count must be a positive integer, got {val!r}")
209+
210+
def _build_command(self, author_tag, original_path, modified_path, target_path, **kwargs):
211+
self._validate_kwargs(kwargs)
212+
213+
cmd = [self.extracted_binaries_path, original_path, modified_path, target_path,
214+
f'--author={author_tag}']
215+
216+
for kwarg, flag in self._BOOL_FLAGS:
217+
if kwargs.get(kwarg):
218+
cmd.append(flag)
219+
220+
for kwarg, neg_flag in self._NEG_FLAGS:
221+
if kwarg in kwargs and not kwargs[kwarg]:
222+
cmd.append(neg_flag)
223+
224+
for kwarg, flag in self._VALUE_FLAGS:
225+
if kwarg in kwargs:
226+
cmd.append(f'{flag}={kwargs[kwarg]}')
227+
228+
return cmd

tests/test_docxodus_engine.py

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,161 @@ def test_run_docxodus_with_real_files(original_docx, modified_docx):
3030
assert len(redline_output) > 0
3131
assert stderr is None
3232
assert "revision(s) found" in stdout
33+
34+
35+
# --- Integration tests for comparison settings ---
36+
37+
def test_docxodus_with_detect_moves(original_docx, modified_docx):
38+
engine = DocxodusEngine()
39+
redline_output, stdout, stderr = engine.run_redline(
40+
"TestAuthor", original_docx, modified_docx,
41+
detect_moves=True, simplify_move_markup=True,
42+
)
43+
assert redline_output is not None
44+
assert len(redline_output) > 0
45+
assert stderr is None
46+
assert "revision(s) found" in stdout
47+
48+
49+
def test_docxodus_with_detail_threshold(original_docx, modified_docx):
50+
engine = DocxodusEngine()
51+
redline_output, stdout, stderr = engine.run_redline(
52+
"TestAuthor", original_docx, modified_docx,
53+
detail_threshold=0.5,
54+
)
55+
assert redline_output is not None
56+
assert len(redline_output) > 0
57+
assert stderr is None
58+
assert "revision(s) found" in stdout
59+
60+
61+
def test_docxodus_with_case_insensitive(original_docx, modified_docx):
62+
engine = DocxodusEngine()
63+
redline_output, stdout, stderr = engine.run_redline(
64+
"TestAuthor", original_docx, modified_docx,
65+
case_insensitive=True,
66+
)
67+
assert redline_output is not None
68+
assert len(redline_output) > 0
69+
assert stderr is None
70+
assert "revision(s) found" in stdout
71+
72+
73+
def test_docxodus_with_no_format_changes(original_docx, modified_docx):
74+
engine = DocxodusEngine()
75+
redline_output, stdout, stderr = engine.run_redline(
76+
"TestAuthor", original_docx, modified_docx,
77+
detect_format_changes=False,
78+
)
79+
assert redline_output is not None
80+
assert len(redline_output) > 0
81+
assert stderr is None
82+
assert "revision(s) found" in stdout
83+
84+
85+
def test_docxodus_with_all_options(original_docx, modified_docx):
86+
engine = DocxodusEngine()
87+
redline_output, stdout, stderr = engine.run_redline(
88+
"TestAuthor", original_docx, modified_docx,
89+
detail_threshold=0.3,
90+
case_insensitive=True,
91+
detect_moves=True,
92+
simplify_move_markup=True,
93+
move_similarity_threshold=0.7,
94+
move_minimum_word_count=2,
95+
detect_format_changes=False,
96+
conflate_spaces=False,
97+
date_time="2025-01-01T00:00:00Z",
98+
)
99+
assert redline_output is not None
100+
assert len(redline_output) > 0
101+
assert stderr is None
102+
assert "revision(s) found" in stdout
103+
104+
105+
# --- Validation tests ---
106+
107+
def test_docxodus_invalid_detail_threshold():
108+
engine = DocxodusEngine()
109+
with pytest.raises(ValueError, match="detail_threshold must be a float between 0.0 and 1.0"):
110+
engine._build_command("Author", "orig", "mod", "out", detail_threshold=1.5)
111+
112+
113+
def test_docxodus_invalid_move_similarity_threshold():
114+
engine = DocxodusEngine()
115+
with pytest.raises(ValueError, match="move_similarity_threshold must be a float between 0.0 and 1.0"):
116+
engine._build_command("Author", "orig", "mod", "out", move_similarity_threshold=-0.1)
117+
118+
119+
def test_docxodus_invalid_move_minimum_word_count():
120+
engine = DocxodusEngine()
121+
with pytest.raises(ValueError, match="move_minimum_word_count must be a positive integer"):
122+
engine._build_command("Author", "orig", "mod", "out", move_minimum_word_count=0)
123+
124+
125+
def test_docxodus_invalid_move_minimum_word_count_type():
126+
engine = DocxodusEngine()
127+
with pytest.raises(ValueError, match="move_minimum_word_count must be a positive integer"):
128+
engine._build_command("Author", "orig", "mod", "out", move_minimum_word_count=2.5)
129+
130+
131+
# --- Unit test for _build_command flag construction ---
132+
133+
def test_build_command_default():
134+
engine = DocxodusEngine()
135+
cmd = engine._build_command("Author", "/tmp/orig.docx", "/tmp/mod.docx", "/tmp/out.docx")
136+
assert cmd[1] == "/tmp/orig.docx"
137+
assert cmd[2] == "/tmp/mod.docx"
138+
assert cmd[3] == "/tmp/out.docx"
139+
assert "--author=Author" in cmd
140+
assert len(cmd) == 5 # binary + 3 positional + --author
141+
142+
143+
def test_build_command_with_all_flags():
144+
engine = DocxodusEngine()
145+
cmd = engine._build_command(
146+
"Author", "/tmp/orig.docx", "/tmp/mod.docx", "/tmp/out.docx",
147+
detail_threshold=0.5,
148+
case_insensitive=True,
149+
detect_moves=True,
150+
simplify_move_markup=True,
151+
move_similarity_threshold=0.7,
152+
move_minimum_word_count=2,
153+
detect_format_changes=False,
154+
conflate_spaces=False,
155+
date_time="2025-01-01T00:00:00Z",
156+
)
157+
assert "--author=Author" in cmd
158+
assert "--case-insensitive" in cmd
159+
assert "--detect-moves" in cmd
160+
assert "--simplify-move-markup" in cmd
161+
assert "--no-detect-format-changes" in cmd
162+
assert "--no-conflate-spaces" in cmd
163+
assert "--detail-threshold=0.5" in cmd
164+
assert "--move-similarity-threshold=0.7" in cmd
165+
assert "--move-minimum-word-count=2" in cmd
166+
assert "--date-time=2025-01-01T00:00:00Z" in cmd
167+
168+
169+
def test_build_command_false_bools_not_added():
170+
"""Boolean flags that are False should not be added to the command."""
171+
engine = DocxodusEngine()
172+
cmd = engine._build_command(
173+
"Author", "/tmp/orig.docx", "/tmp/mod.docx", "/tmp/out.docx",
174+
detect_moves=False,
175+
case_insensitive=False,
176+
)
177+
assert "--detect-moves" not in cmd
178+
assert "--case-insensitive" not in cmd
179+
180+
181+
def test_build_command_negatable_true_not_added():
182+
"""Negatable flags that are True (default) should not add --no- flags."""
183+
engine = DocxodusEngine()
184+
cmd = engine._build_command(
185+
"Author", "/tmp/orig.docx", "/tmp/mod.docx", "/tmp/out.docx",
186+
detect_format_changes=True,
187+
conflate_spaces=True,
188+
)
189+
assert "--no-detect-format-changes" not in cmd
190+
assert "--no-conflate-spaces" not in cmd

0 commit comments

Comments
 (0)