feat(calc): formula evaluation engine with recursive expression parser#1
feat(calc): formula evaluation engine with recursive expression parser#1
Conversation
…arser Build wolfxl.calc subpackage for evaluating Excel formulas in-memory: - Protocol layer: CalcEngine Protocol, CellDelta/RecalcResult dataclasses - Parser: regex-based reference extraction, range expansion, optional formulas-lib compilation - Graph: DependencyGraph with Kahn's topological sort, cycle detection, affected-cells BFS - Functions: 39-function whitelist with 15 builtin implementations (SUM, IF, ROUND, ABS, IFERROR, MIN/MAX, AVERAGE, etc.) - Evaluator: recursive descent expression parser handling operator precedence, balanced parentheses, and arbitrarily nested formulas like =ROUND(SUM(A1:A5)*IF(B1>0,1.1,1.0),2) - Workbook convenience methods: calculate() and recalculate() with evaluator caching for efficient repeated perturbation rounds - Pyright configuration for maturin python-source layout 142 new tests across 5 test files + 4 golden .xlsx fixtures. All 219 tests pass (142 new + 77 existing). Designed for LRBench-Agent perturbation scoring: perturb input cells, check if downstream formula cells propagate changes. Formulas score ~1.0, hardcoded values score 0.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 89a2ccc1de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
python/wolfxl/calc/_evaluator.py
Outdated
| try: | ||
| lf = float(left) if not isinstance(left, (int, float)) else left | ||
| rf = float(right) if not isinstance(right, (int, float)) else right | ||
| except (ValueError, TypeError): | ||
| return False |
There was a problem hiding this comment.
Handle text operands in comparison evaluation
The comparison helper currently coerces both sides to float and returns False on conversion errors, so any text comparison like A1="OK" or "x"="x" evaluates as false. This breaks common formulas such as IF(A1="OK",...) by always taking the false branch when string values are involved, producing incorrect workbook results even though string literals are otherwise parsed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 668fd6e. _compare() now handles text operands with case-insensitive string comparison (matching Excel behavior). Added 4 tests in TestTextComparison.
python/wolfxl/calc/_evaluator.py
Outdated
| elif ch == ',' and depth == 0: | ||
| args.append(self._resolve_arg(current.strip(), sheet)) |
There was a problem hiding this comment.
Respect quoted commas when splitting function args
Function arguments are split on every top-level comma without tracking whether the parser is inside a quoted string, so formulas like =IF(TRUE,"a,b","c") are tokenized into extra arguments. In this path IF fails arity validation and _eval_function returns None, causing valid formulas with comma-containing text literals to evaluate incorrectly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 668fd6e. _parse_function_args() now tracks in_string state and handles Excel escaped quotes. Commas inside string literals no longer split arguments. Added 2 tests in TestQuotedCommasInArgs.
There was a problem hiding this comment.
Pull request overview
Adds a new wolfxl.calc subpackage to evaluate Excel formulas in-memory (parser + dependency graph + evaluator) and integrates it with Workbook.calculate() / Workbook.recalculate() to support efficient perturbation-based rescoring workflows.
Changes:
- Introduces
wolfxl.calcwith reference extraction, dependency graph (topo sort + affected-cells), function registry, and a recursive expression evaluator. - Adds
Workbook.calculate()/Workbook.recalculate()convenience methods with evaluator caching. - Adds extensive unit + integration tests and commits golden
.xlsxfixtures; addswolfxl[calc]optional dependency.
Reviewed changes
Copilot reviewed 13 out of 17 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_calc_parser.py | Tests for reference/range/function extraction and range expansion. |
| tests/test_calc_integration.py | End-to-end workbook tests + caching + determinism/perf assertions. |
| tests/test_calc_graph.py | Tests for dependency graph ordering, cycles, affected-cells, max depth. |
| tests/test_calc_functions.py | Tests for function whitelist/registry and builtin implementations. |
| tests/test_calc_evaluator.py | Tests for evaluator behavior, precedence, nesting, edge cases. |
| tests/fixtures/calc/sum_chain.xlsx | Golden workbook fixture used by integration tests. |
| tests/fixtures/calc/mixed.xlsx | Golden workbook fixture used by integration tests. |
| tests/fixtures/calc/hardcoded.xlsx | Golden workbook fixture used by integration tests. |
| tests/fixtures/calc/cross_sheet.xlsx | Golden workbook fixture used by integration tests. |
| python/wolfxl/calc/_protocol.py | Protocol + dataclasses for perturbation deltas/results. |
| python/wolfxl/calc/_parser.py | Regex-based ref extraction + range expansion + optional formulas probing. |
| python/wolfxl/calc/_graph.py | Dependency graph + topo sort + affected-cells + max-depth. |
| python/wolfxl/calc/_functions.py | Function whitelist and builtin implementations + registry. |
| python/wolfxl/calc/_evaluator.py | Recursive expression parser/evaluator + recalc delta computation. |
| python/wolfxl/calc/init.py | Public re-exports for wolfxl.calc. |
| python/wolfxl/_workbook.py | Adds calculate() / recalculate() methods with caching. |
| pyproject.toml | Adds wolfxl[calc] extra and Pyright configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
python/wolfxl/calc/_evaluator.py
Outdated
| return float(expr) if '.' in expr else int(expr) | ||
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
Numeric literal parsing only handles plain ints and decimals via int(expr) / float(expr) gated by '.' in expr. This fails for valid Excel numeric formats like scientific notation (1E3, -2.5e-4) and can cause such tokens to be treated as cell refs. Consider using a regex (or float() first) to support E/e notation and then decide int-vs-float if needed.
| return float(expr) if '.' in expr else int(expr) | |
| except ValueError: | |
| pass | |
| # Use float() first to support integers, decimals, and scientific notation (e.g. 1E3, -2.5e-4). | |
| num = float(expr) | |
| except ValueError: | |
| pass | |
| else: | |
| # Preserve existing behavior: return int for plain integer literals, float otherwise. | |
| if re.fullmatch(r'[+-]?\d+', expr): | |
| return int(expr) | |
| return num |
There was a problem hiding this comment.
Fixed in 668fd6e. Numeric literal parsing now uses float() first to support scientific notation (1E3, -2.5e-4), then checks for plain int. Also added scientific notation guard in _find_top_level_split() so e.g. 2.5e-1 isn't split as subtraction. Added 2 tests.
| # Strings in formulas (to skip refs inside string literals) | ||
| _STRING_RE = re.compile(r'"[^"]*"') | ||
|
|
||
|
|
||
| def _strip_strings(formula: str) -> str: | ||
| """Remove string literals so refs inside quotes aren't matched.""" | ||
| return _STRING_RE.sub("", formula) |
There was a problem hiding this comment.
_STRING_RE = re.compile(r'"[^"]*"') doesn't handle Excel-escaped quotes inside string literals (Excel uses doubled quotes ""), so reference/function extraction can become incorrect for formulas containing embedded quotes. Consider a pattern/state machine that understands "" escapes (and potentially unterminated strings) to make _strip_strings() safe.
There was a problem hiding this comment.
Acknowledged. The current regex handles the common case. Excel doubled-quote escaping ("") is an edge case we haven't hit in practice. Filed as a known limitation - will address if real workbooks surface this.
| class TestPerformance: | ||
| def test_500_formula_cells_under_200ms(self) -> None: | ||
| """calculate() on a 500-formula workbook must complete in <200ms.""" | ||
| wb = _build_income_statement(num_rows=250) # 250*2 + 3 = 503 formulas | ||
| ev = WorkbookEvaluator() | ||
| ev.load(wb) | ||
|
|
||
| start = time.perf_counter() | ||
| ev.calculate() | ||
| elapsed = time.perf_counter() - start | ||
|
|
||
| assert elapsed < 0.200, f"calculate() took {elapsed:.3f}s (>200ms)" | ||
|
|
There was a problem hiding this comment.
This test enforces a hard wall-clock threshold (<200ms) which is prone to flakiness across CI runners, CPU contention, and debug builds. Consider converting this to a benchmark-style test (optional/marked), loosening the threshold substantially, or asserting relative performance characteristics without absolute timing.
There was a problem hiding this comment.
Fixed in 668fd6e. Threshold loosened from 200ms to 2s and test marked with @pytest.mark.slow. Local runs typically complete in <100ms but CI runners vary widely.
| # Fixture generation (saved to disk once) | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| @pytest.fixture(scope="session", autouse=True) | ||
| def golden_fixtures() -> None: | ||
| """Generate golden .xlsx fixtures for other tests.""" | ||
| os.makedirs(FIXTURE_DIR, exist_ok=True) | ||
|
|
||
| builders = { | ||
| "sum_chain.xlsx": _build_sum_chain, | ||
| "cross_sheet.xlsx": _build_cross_sheet, | ||
| "hardcoded.xlsx": _build_hardcoded, | ||
| "mixed.xlsx": _build_mixed, | ||
| } | ||
|
|
||
| for name, builder in builders.items(): | ||
| path = os.path.join(FIXTURE_DIR, name) | ||
| if not os.path.exists(path): | ||
| wb = builder() | ||
| wb.save(path) | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
The autouse session fixture writes golden .xlsx files into tests/fixtures/calc when missing. Tests ideally shouldn't mutate the source tree; this can also fail in read-only checkouts. Since the fixtures are committed in this PR, consider removing runtime generation (or generate into a temp directory instead).
| # Fixture generation (saved to disk once) | |
| # --------------------------------------------------------------------------- | |
| @pytest.fixture(scope="session", autouse=True) | |
| def golden_fixtures() -> None: | |
| """Generate golden .xlsx fixtures for other tests.""" | |
| os.makedirs(FIXTURE_DIR, exist_ok=True) | |
| builders = { | |
| "sum_chain.xlsx": _build_sum_chain, | |
| "cross_sheet.xlsx": _build_cross_sheet, | |
| "hardcoded.xlsx": _build_hardcoded, | |
| "mixed.xlsx": _build_mixed, | |
| } | |
| for name, builder in builders.items(): | |
| path = os.path.join(FIXTURE_DIR, name) | |
| if not os.path.exists(path): | |
| wb = builder() | |
| wb.save(path) | |
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
The fixtures are committed in the PR, and the autouse fixture only generates them if missing. This is a convenience for local dev - in CI the committed fixtures are used. Keeping as-is since it's harmless (no-op in read-only checkouts where fixtures already exist in the commit).
| from wolfxl.calc._evaluator import WorkbookEvaluator | ||
|
|
||
| ev = WorkbookEvaluator() | ||
| ev.load(self) | ||
| result = ev.calculate() | ||
| self._evaluator = ev # cache for recalculate() | ||
| return result |
There was a problem hiding this comment.
Workbook.calculate() assigns self._evaluator, but _evaluator is never declared/initialized on the class. With [tool.mypy] strict = true, this typically triggers an "attribute-defined" error, and it also means reader/patcher constructors won't have the attribute. Initialize self._evaluator (e.g., to None) in __init__ and also set it on instances created via _from_reader/_from_patcher.
There was a problem hiding this comment.
Fixed in 668fd6e. _evaluator is now initialized to None in init, _from_reader, and _from_patcher. Strict mypy should be happy.
| def recalculate( | ||
| self, | ||
| perturbations: dict[str, float | int], | ||
| tolerance: float = 1e-10, | ||
| ) -> Any: | ||
| """Perturb input cells and recompute affected formulas. |
There was a problem hiding this comment.
The return type of Workbook.recalculate() is annotated as Any, but the docstring says it returns RecalcResult, and callers/tests treat it as such. Please tighten the annotation to RecalcResult (using TYPE_CHECKING import to avoid hard runtime dependency if needed) so type checkers and IDEs get the correct contract.
There was a problem hiding this comment.
Fixed in 668fd6e. Return type tightened to RecalcResult using TYPE_CHECKING import to avoid hard runtime dependency on wolfxl.calc.
| formula_cells = set(self.formulas.keys()) | ||
| if not formula_cells: | ||
| return [] | ||
|
|
||
| # Compute in-degrees within formula cells only | ||
| in_degree: dict[str, int] = {} | ||
| for cell in formula_cells: | ||
| deps = self.dependencies.get(cell, set()) | ||
| # Only count deps that are themselves formula cells | ||
| in_degree[cell] = len(deps & formula_cells) | ||
|
|
||
| # Start with formula cells that have no formula-cell dependencies | ||
| queue: deque[str] = deque() | ||
| for cell in formula_cells: | ||
| if in_degree[cell] == 0: | ||
| queue.append(cell) | ||
|
|
||
| order: list[str] = [] | ||
| while queue: | ||
| cell = queue.popleft() | ||
| order.append(cell) | ||
| # Reduce in-degree for dependent formula cells | ||
| for dep in self.dependents.get(cell, set()): | ||
| if dep in formula_cells: | ||
| in_degree[dep] -= 1 | ||
| if in_degree[dep] == 0: | ||
| queue.append(dep) | ||
|
|
There was a problem hiding this comment.
topological_order() iterates over set(self.formulas.keys()) and set/dict/set iteration order can vary across processes (hash randomization), making evaluation order and any derived outputs (e.g., delta ordering) non-deterministic. If determinism is a goal, consider ordering formula_cells deterministically (e.g., sorted(...)) and iterating dependents in a stable order as well.
There was a problem hiding this comment.
Fixed in 668fd6e. topological_order() now uses sorted() on initial queue and when adding newly-unblocked dependents. Evaluation order is deterministic regardless of Python hash randomization.
|
|
||
| for ch in args_str: | ||
| if ch == '(': | ||
| depth += 1 | ||
| current += ch | ||
| elif ch == ')': | ||
| depth -= 1 | ||
| current += ch | ||
| elif ch == ',' and depth == 0: | ||
| args.append(self._resolve_arg(current.strip(), sheet)) | ||
| current = "" | ||
| else: | ||
| current += ch | ||
|
|
There was a problem hiding this comment.
_parse_function_args() splits on commas by tracking only parenthesis depth, but it doesn't track whether it's inside a string literal. This will incorrectly split arguments for formulas like CONCATENATE("a,b", "c") or IF(A1="x,y",...). Track in_string (and Excel escaped quotes "") so commas inside quoted strings don't split arguments.
| for ch in args_str: | |
| if ch == '(': | |
| depth += 1 | |
| current += ch | |
| elif ch == ')': | |
| depth -= 1 | |
| current += ch | |
| elif ch == ',' and depth == 0: | |
| args.append(self._resolve_arg(current.strip(), sheet)) | |
| current = "" | |
| else: | |
| current += ch | |
| in_string = False | |
| i = 0 | |
| length = len(args_str) | |
| while i < length: | |
| ch = args_str[i] | |
| if ch == '"': | |
| if in_string: | |
| # Inside a string: handle Excel escaped quote ("") | |
| if i + 1 < length and args_str[i + 1] == '"': | |
| # Escaped quote, stays inside string | |
| current += '"' | |
| i += 2 | |
| continue | |
| else: | |
| # End of string | |
| in_string = False | |
| current += ch | |
| i += 1 | |
| continue | |
| else: | |
| # Start of string | |
| in_string = True | |
| current += ch | |
| i += 1 | |
| continue | |
| if not in_string: | |
| if ch == '(': | |
| depth += 1 | |
| current += ch | |
| elif ch == ')': | |
| depth -= 1 | |
| current += ch | |
| elif ch == ',' and depth == 0: | |
| args.append(self._resolve_arg(current.strip(), sheet)) | |
| current = "" | |
| else: | |
| current += ch | |
| else: | |
| # Inside a string, all characters (including commas/parentheses) are literal | |
| current += ch | |
| i += 1 |
There was a problem hiding this comment.
Fixed in 668fd6e (same as the Codex comment). _parse_function_args() now tracks in_string state with Excel escaped quote handling.
CI fixes: - Add `from __future__ import annotations` to __init__.py (Python 3.9 compat) - Use `--no-index` in CI pip install to prevent PyPI fallback - Fix import sorting in all 5 test files (ruff I001) - Move ruff select to [tool.ruff.lint] section (deprecation warning) - Register pytest `slow` mark in pyproject.toml PR review fixes (Codex + Copilot): - Handle text operands in comparison evaluation (case-insensitive) - Respect quoted commas when splitting function args - Support scientific notation in numeric literals (e.g. 2.5e-1) - Skip +/- inside scientific notation during operator scanning - Initialize _evaluator attribute on Workbook class + all constructors - Tighten recalculate() return type from Any to RecalcResult - Deterministic topological ordering via sorted() on formula cells - Loosen perf test threshold (200ms -> 2s) to prevent CI flakiness 8 new tests: string comparison (4), quoted commas (2), scientific notation (2). 227 total tests pass (150 calc + 77 existing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- POWER: return #NUM! for negative base with fractional exponent instead of leaking complex numbers (Copilot #2, Codex #11) - LEFT: return #VALUE! for negative num_chars (Copilot #1) - MID: return #VALUE! for start < 1 or num_chars < 0 (Copilot #8) - range_shape: use min/max instead of abs() for consistency with expand_range (Copilot #3) - Add @_requires_formulas skip marker to formulas-dependent test classes so CI passes without wolfxl[calc] extra (Codex #10) - Add 4 edge case tests: POWER(-1,0.5), LEFT(-1), MID(0,..), MID(.,-1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
wolfxl.calcsubpackage for evaluating Excel formulas in-memory, designed for LRBench-Agent perturbation scoring=ROUND(SUM(A1:A5)*IF(B1>0,1.1,1.0),2)calculate(),recalculate()) with evaluator caching for efficient repeated perturbation roundsformulaslibrary integration viawolfxl[calc]extra (EUPL-licensed, doesn't affect MIT license)Architecture
Key design decisions
_match_function_call()only matches when the close-paren is the last character, soSUM(A1:A5)*2is correctly parsed as a binary expression, not a function call.Workbook.calculate()caches the evaluator on the instance;recalculate()reuses it, avoiding O(scan+calculate) per perturbation round.formulaslibrary is an optional extra. All 15 builtin functions work without it.Test plan
=ROUND(SUM*IF,2),=SUM()+SUM(),=(A+B)*C,=IF(SUM>N,...)*2=A1+A2*A3correctly evaluates asA1+(A2*A3)recalculate()reuses cached evaluator aftercalculate()🤖 Generated with Claude Code