feat(calc): formula evaluation engine with recursive expression parser by wolfiesch · Pull Request #1 · SynthGL/wolfxl

wolfiesch · 2026-02-20T01:06:53Z

Summary

New wolfxl.calc subpackage for evaluating Excel formulas in-memory, designed for LRBench-Agent perturbation scoring
Recursive descent expression parser replacing fragile regex dispatch - handles operator precedence, balanced parentheses, and arbitrarily nested formulas like =ROUND(SUM(A1:A5)*IF(B1>0,1.1,1.0),2)
39-function whitelist with 15 builtin Python implementations (SUM, IF, ROUND, ABS, IFERROR, MIN/MAX, AVERAGE, AND/OR/NOT, COUNT/COUNTA)
DependencyGraph with Kahn's topological sort, cycle detection, and BFS affected-cells propagation
Workbook convenience methods (calculate(), recalculate()) with evaluator caching for efficient repeated perturbation rounds
Optional formulas library integration via wolfxl[calc] extra (EUPL-licensed, doesn't affect MIT license)

Architecture

wolfxl.calc (new - pure Python, no Rust changes)
├── _protocol.py      CalcEngine Protocol + CellDelta, RecalcResult
├── _parser.py        Regex ref extraction + range expansion
├── _graph.py         DependencyGraph (Kahn's topological sort)
├── _functions.py     39-function whitelist + 15 builtin implementations
├── _evaluator.py     WorkbookEvaluator (recursive descent parser)
└── __init__.py       Re-exports

Key design decisions

Recursive descent over regex: The expression parser uses paren-depth-aware operator splitting with three precedence passes (comparison < additive < multiplicative). Right-to-left scanning gives correct left-to-right associativity.
Balanced paren matching: _match_function_call() only matches when the close-paren is the last character, so SUM(A1:A5)*2 is correctly parsed as a binary expression, not a function call.
Evaluator caching: Workbook.calculate() caches the evaluator on the instance; recalculate() reuses it, avoiding O(scan+calculate) per perturbation round.
Optional dependency: formulas library is an optional extra. All 15 builtin functions work without it.

Test plan

🤖 Generated with Claude Code

…arser Build wolfxl.calc subpackage for evaluating Excel formulas in-memory: - Protocol layer: CalcEngine Protocol, CellDelta/RecalcResult dataclasses - Parser: regex-based reference extraction, range expansion, optional formulas-lib compilation - Graph: DependencyGraph with Kahn's topological sort, cycle detection, affected-cells BFS - Functions: 39-function whitelist with 15 builtin implementations (SUM, IF, ROUND, ABS, IFERROR, MIN/MAX, AVERAGE, etc.) - Evaluator: recursive descent expression parser handling operator precedence, balanced parentheses, and arbitrarily nested formulas like =ROUND(SUM(A1:A5)*IF(B1>0,1.1,1.0),2) - Workbook convenience methods: calculate() and recalculate() with evaluator caching for efficient repeated perturbation rounds - Pyright configuration for maturin python-source layout 142 new tests across 5 test files + 4 golden .xlsx fixtures. All 219 tests pass (142 new + 77 existing). Designed for LRBench-Agent perturbation scoring: perturb input cells, check if downstream formula cells propagate changes. Formulas score ~1.0, hardcoded values score 0.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89a2ccc1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-20T01:10:43Z

python/wolfxl/calc/_evaluator.py

+    try:
+        lf = float(left) if not isinstance(left, (int, float)) else left
+        rf = float(right) if not isinstance(right, (int, float)) else right
+    except (ValueError, TypeError):
+        return False


Handle text operands in comparison evaluation

The comparison helper currently coerces both sides to float and returns False on conversion errors, so any text comparison like A1="OK" or "x"="x" evaluates as false. This breaks common formulas such as IF(A1="OK",...) by always taking the false branch when string values are involved, producing incorrect workbook results even though string literals are otherwise parsed.

Useful? React with 👍 / 👎.

Fixed in 668fd6e. _compare() now handles text operands with case-insensitive string comparison (matching Excel behavior). Added 4 tests in TestTextComparison.

chatgpt-codex-connector · 2026-02-20T01:10:43Z

python/wolfxl/calc/_evaluator.py

+            elif ch == ',' and depth == 0:
+                args.append(self._resolve_arg(current.strip(), sheet))


Respect quoted commas when splitting function args

Function arguments are split on every top-level comma without tracking whether the parser is inside a quoted string, so formulas like =IF(TRUE,"a,b","c") are tokenized into extra arguments. In this path IF fails arity validation and _eval_function returns None, causing valid formulas with comma-containing text literals to evaluate incorrectly.

Useful? React with 👍 / 👎.

Fixed in 668fd6e. _parse_function_args() now tracks in_string state and handles Excel escaped quotes. Commas inside string literals no longer split arguments. Added 2 tests in TestQuotedCommasInArgs.

Copilot

Pull request overview

Adds a new wolfxl.calc subpackage to evaluate Excel formulas in-memory (parser + dependency graph + evaluator) and integrates it with Workbook.calculate() / Workbook.recalculate() to support efficient perturbation-based rescoring workflows.

Changes:

Introduces wolfxl.calc with reference extraction, dependency graph (topo sort + affected-cells), function registry, and a recursive expression evaluator.
Adds Workbook.calculate() / Workbook.recalculate() convenience methods with evaluator caching.
Adds extensive unit + integration tests and commits golden .xlsx fixtures; adds wolfxl[calc] optional dependency.

Reviewed changes

Copilot reviewed 13 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/test_calc_parser.py	Tests for reference/range/function extraction and range expansion.
tests/test_calc_integration.py	End-to-end workbook tests + caching + determinism/perf assertions.
tests/test_calc_graph.py	Tests for dependency graph ordering, cycles, affected-cells, max depth.
tests/test_calc_functions.py	Tests for function whitelist/registry and builtin implementations.
tests/test_calc_evaluator.py	Tests for evaluator behavior, precedence, nesting, edge cases.
tests/fixtures/calc/sum_chain.xlsx	Golden workbook fixture used by integration tests.
tests/fixtures/calc/mixed.xlsx	Golden workbook fixture used by integration tests.
tests/fixtures/calc/hardcoded.xlsx	Golden workbook fixture used by integration tests.
tests/fixtures/calc/cross_sheet.xlsx	Golden workbook fixture used by integration tests.
python/wolfxl/calc/_protocol.py	Protocol + dataclasses for perturbation deltas/results.
python/wolfxl/calc/_parser.py	Regex-based ref extraction + range expansion + optional `formulas` probing.
python/wolfxl/calc/_graph.py	Dependency graph + topo sort + affected-cells + max-depth.
python/wolfxl/calc/_functions.py	Function whitelist and builtin implementations + registry.
python/wolfxl/calc/_evaluator.py	Recursive expression parser/evaluator + recalc delta computation.
python/wolfxl/calc/init.py	Public re-exports for `wolfxl.calc`.
python/wolfxl/_workbook.py	Adds `calculate()` / `recalculate()` methods with caching.
pyproject.toml	Adds `wolfxl[calc]` extra and Pyright configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-20T01:13:32Z

python/wolfxl/calc/_evaluator.py

+            return float(expr) if '.' in expr else int(expr)
+        except ValueError:
+            pass


Numeric literal parsing only handles plain ints and decimals via int(expr) / float(expr) gated by '.' in expr. This fails for valid Excel numeric formats like scientific notation (1E3, -2.5e-4) and can cause such tokens to be treated as cell refs. Consider using a regex (or float() first) to support E/e notation and then decide int-vs-float if needed.

Suggested change

return float(expr) if '.' in expr else int(expr)

except ValueError:

pass

# Use float() first to support integers, decimals, and scientific notation (e.g. 1E3, -2.5e-4).

num = float(expr)

except ValueError:

pass

else:

# Preserve existing behavior: return int for plain integer literals, float otherwise.

if re.fullmatch(r'[+-]?\d+', expr):

return int(expr)

return num

Fixed in 668fd6e. Numeric literal parsing now uses float() first to support scientific notation (1E3, -2.5e-4), then checks for plain int. Also added scientific notation guard in _find_top_level_split() so e.g. 2.5e-1 isn't split as subtraction. Added 2 tests.

Copilot · 2026-02-20T01:13:32Z

python/wolfxl/calc/_parser.py

+# Strings in formulas (to skip refs inside string literals)
+_STRING_RE = re.compile(r'"[^"]*"')
+
+
+def _strip_strings(formula: str) -> str:
+    """Remove string literals so refs inside quotes aren't matched."""
+    return _STRING_RE.sub("", formula)


_STRING_RE = re.compile(r'"[^"]*"') doesn't handle Excel-escaped quotes inside string literals (Excel uses doubled quotes ""), so reference/function extraction can become incorrect for formulas containing embedded quotes. Consider a pattern/state machine that understands "" escapes (and potentially unterminated strings) to make _strip_strings() safe.

Acknowledged. The current regex handles the common case. Excel doubled-quote escaping ("") is an edge case we haven't hit in practice. Filed as a known limitation - will address if real workbooks surface this.

Copilot · 2026-02-20T01:13:33Z

tests/test_calc_integration.py

+class TestPerformance:
+    def test_500_formula_cells_under_200ms(self) -> None:
+        """calculate() on a 500-formula workbook must complete in <200ms."""
+        wb = _build_income_statement(num_rows=250)  # 250*2 + 3 = 503 formulas
+        ev = WorkbookEvaluator()
+        ev.load(wb)
+
+        start = time.perf_counter()
+        ev.calculate()
+        elapsed = time.perf_counter() - start
+
+        assert elapsed < 0.200, f"calculate() took {elapsed:.3f}s (>200ms)"
+


This test enforces a hard wall-clock threshold (<200ms) which is prone to flakiness across CI runners, CPU contention, and debug builds. Consider converting this to a benchmark-style test (optional/marked), loosening the threshold substantially, or asserting relative performance characteristics without absolute timing.

Fixed in 668fd6e. Threshold loosened from 200ms to 2s and test marked with @pytest.mark.slow. Local runs typically complete in <100ms but CI runners vary widely.

Copilot · 2026-02-20T01:13:33Z

tests/test_calc_integration.py

+# Fixture generation (saved to disk once)
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture(scope="session", autouse=True)
+def golden_fixtures() -> None:
+    """Generate golden .xlsx fixtures for other tests."""
+    os.makedirs(FIXTURE_DIR, exist_ok=True)
+
+    builders = {
+        "sum_chain.xlsx": _build_sum_chain,
+        "cross_sheet.xlsx": _build_cross_sheet,
+        "hardcoded.xlsx": _build_hardcoded,
+        "mixed.xlsx": _build_mixed,
+    }
+
+    for name, builder in builders.items():
+        path = os.path.join(FIXTURE_DIR, name)
+        if not os.path.exists(path):
+            wb = builder()
+            wb.save(path)
+
+
+# ---------------------------------------------------------------------------


The autouse session fixture writes golden .xlsx files into tests/fixtures/calc when missing. Tests ideally shouldn't mutate the source tree; this can also fail in read-only checkouts. Since the fixtures are committed in this PR, consider removing runtime generation (or generate into a temp directory instead).

Suggested change

# Fixture generation (saved to disk once)

# ---------------------------------------------------------------------------

@pytest.fixture(scope="session", autouse=True)

def golden_fixtures() -> None:

"""Generate golden .xlsx fixtures for other tests."""

os.makedirs(FIXTURE_DIR, exist_ok=True)

builders = {

"sum_chain.xlsx": _build_sum_chain,

"cross_sheet.xlsx": _build_cross_sheet,

"hardcoded.xlsx": _build_hardcoded,

"mixed.xlsx": _build_mixed,

}

for name, builder in builders.items():

path = os.path.join(FIXTURE_DIR, name)

if not os.path.exists(path):

wb = builder()

wb.save(path)

# ---------------------------------------------------------------------------

The fixtures are committed in the PR, and the autouse fixture only generates them if missing. This is a convenience for local dev - in CI the committed fixtures are used. Keeping as-is since it's harmless (no-op in read-only checkouts where fixtures already exist in the commit).

Copilot · 2026-02-20T01:13:33Z

python/wolfxl/_workbook.py

+        from wolfxl.calc._evaluator import WorkbookEvaluator
+
+        ev = WorkbookEvaluator()
+        ev.load(self)
+        result = ev.calculate()
+        self._evaluator = ev  # cache for recalculate()
+        return result


Workbook.calculate() assigns self._evaluator, but _evaluator is never declared/initialized on the class. With [tool.mypy] strict = true, this typically triggers an "attribute-defined" error, and it also means reader/patcher constructors won't have the attribute. Initialize self._evaluator (e.g., to None) in __init__ and also set it on instances created via _from_reader/_from_patcher.

Fixed in 668fd6e. _evaluator is now initialized to None in init, _from_reader, and _from_patcher. Strict mypy should be happy.

Copilot · 2026-02-20T01:13:33Z

python/wolfxl/_workbook.py

+    def recalculate(
+        self,
+        perturbations: dict[str, float | int],
+        tolerance: float = 1e-10,
+    ) -> Any:
+        """Perturb input cells and recompute affected formulas.


The return type of Workbook.recalculate() is annotated as Any, but the docstring says it returns RecalcResult, and callers/tests treat it as such. Please tighten the annotation to RecalcResult (using TYPE_CHECKING import to avoid hard runtime dependency if needed) so type checkers and IDEs get the correct contract.

Fixed in 668fd6e. Return type tightened to RecalcResult using TYPE_CHECKING import to avoid hard runtime dependency on wolfxl.calc.

Copilot · 2026-02-20T01:13:33Z

python/wolfxl/calc/_graph.py

+        formula_cells = set(self.formulas.keys())
+        if not formula_cells:
+            return []
+
+        # Compute in-degrees within formula cells only
+        in_degree: dict[str, int] = {}
+        for cell in formula_cells:
+            deps = self.dependencies.get(cell, set())
+            # Only count deps that are themselves formula cells
+            in_degree[cell] = len(deps & formula_cells)
+
+        # Start with formula cells that have no formula-cell dependencies
+        queue: deque[str] = deque()
+        for cell in formula_cells:
+            if in_degree[cell] == 0:
+                queue.append(cell)
+
+        order: list[str] = []
+        while queue:
+            cell = queue.popleft()
+            order.append(cell)
+            # Reduce in-degree for dependent formula cells
+            for dep in self.dependents.get(cell, set()):
+                if dep in formula_cells:
+                    in_degree[dep] -= 1
+                    if in_degree[dep] == 0:
+                        queue.append(dep)
+


topological_order() iterates over set(self.formulas.keys()) and set/dict/set iteration order can vary across processes (hash randomization), making evaluation order and any derived outputs (e.g., delta ordering) non-deterministic. If determinism is a goal, consider ordering formula_cells deterministically (e.g., sorted(...)) and iterating dependents in a stable order as well.

Fixed in 668fd6e. topological_order() now uses sorted() on initial queue and when adding newly-unblocked dependents. Evaluation order is deterministic regardless of Python hash randomization.

Copilot · 2026-02-20T01:13:34Z

python/wolfxl/calc/_evaluator.py

+
+        for ch in args_str:
+            if ch == '(':
+                depth += 1
+                current += ch
+            elif ch == ')':
+                depth -= 1
+                current += ch
+            elif ch == ',' and depth == 0:
+                args.append(self._resolve_arg(current.strip(), sheet))
+                current = ""
+            else:
+                current += ch
+


_parse_function_args() splits on commas by tracking only parenthesis depth, but it doesn't track whether it's inside a string literal. This will incorrectly split arguments for formulas like CONCATENATE("a,b", "c") or IF(A1="x,y",...). Track in_string (and Excel escaped quotes "") so commas inside quoted strings don't split arguments.

Suggested change

for ch in args_str:

if ch == '(':

depth += 1

current += ch

elif ch == ')':

depth -= 1

current += ch

elif ch == ',' and depth == 0:

args.append(self._resolve_arg(current.strip(), sheet))

current = ""

else:

current += ch

in_string = False

i = 0

length = len(args_str)

while i < length:

ch = args_str[i]

if ch == '"':

if in_string:

# Inside a string: handle Excel escaped quote ("")

if i + 1 < length and args_str[i + 1] == '"':

# Escaped quote, stays inside string

current += '"'

i += 2

continue

else:

# End of string

in_string = False

current += ch

i += 1

continue

else:

# Start of string

in_string = True

current += ch

i += 1

continue

if not in_string:

if ch == '(':

depth += 1

current += ch

elif ch == ')':

depth -= 1

current += ch

elif ch == ',' and depth == 0:

args.append(self._resolve_arg(current.strip(), sheet))

current = ""

else:

current += ch

else:

# Inside a string, all characters (including commas/parentheses) are literal

current += ch

i += 1

Fixed in 668fd6e (same as the Codex comment). _parse_function_args() now tracks in_string state with Excel escaped quote handling.

CI fixes: - Add `from __future__ import annotations` to __init__.py (Python 3.9 compat) - Use `--no-index` in CI pip install to prevent PyPI fallback - Fix import sorting in all 5 test files (ruff I001) - Move ruff select to [tool.ruff.lint] section (deprecation warning) - Register pytest `slow` mark in pyproject.toml PR review fixes (Codex + Copilot): - Handle text operands in comparison evaluation (case-insensitive) - Respect quoted commas when splitting function args - Support scientific notation in numeric literals (e.g. 2.5e-1) - Skip +/- inside scientific notation during operator scanning - Initialize _evaluator attribute on Workbook class + all constructors - Tighten recalculate() return type from Any to RecalcResult - Deterministic topological ordering via sorted() on formula cells - Loosen perf test threshold (200ms -> 2s) to prevent CI flakiness 8 new tests: string comparison (4), quoted commas (2), scientific notation (2). 227 total tests pass (150 calc + 77 existing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- POWER: return #NUM! for negative base with fractional exponent instead of leaking complex numbers (Copilot #2, Codex #11) - LEFT: return #VALUE! for negative num_chars (Copilot #1) - MID: return #VALUE! for start < 1 or num_chars < 0 (Copilot #8) - range_shape: use min/max instead of abs() for consistency with expand_range (Copilot #3) - Add @_requires_formulas skip marker to formulas-dependent test classes so CI passes without wolfxl[calc] extra (Codex #10) - Add 4 edge case tests: POWER(-1,0.5), LEFT(-1), MID(0,..), MID(.,-1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 20, 2026 01:06

Copilot started reviewing on behalf of wolfiesch February 20, 2026 01:07 View session

chatgpt-codex-connector bot reviewed Feb 20, 2026

View reviewed changes

Copilot AI reviewed Feb 20, 2026

View reviewed changes

wolfiesch merged commit d0f4806 into main Feb 20, 2026
7 checks passed

wolfiesch deleted the feat/calc-engine branch February 20, 2026 03:16

wolfiesch mentioned this pull request Feb 20, 2026

feat: batch write/read APIs + v0.2.0 #2

Merged

3 tasks

		elif ch == ',' and depth == 0:
		args.append(self._resolve_arg(current.strip(), sheet))

-            return float(expr) if '.' in expr else int(expr)
-        except ValueError:
-            pass
+            # Use float() first to support integers, decimals, and scientific notation (e.g. 1E3, -2.5e-4).
+            num = float(expr)
+        except ValueError:
+            pass
+        else:
+            # Preserve existing behavior: return int for plain integer literals, float otherwise.
+            if re.fullmatch(r'[+-]?\d+', expr):
+                return int(expr)
+            return num

Comments

Conversation

wolfiesch commented Feb 20, 2026

Summary

Architecture

Key design decisions

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant