-
Notifications
You must be signed in to change notification settings - Fork 0
design-v2: resolve architecture inconsistencies before implementation #19
Description
Summary
A cross-document review of docs/design-v2/ found several design inconsistencies that should be resolved before implementation proceeds.
The mathematical examples look internally consistent, but the architecture docs still disagree on crate boundaries, identity/caching rules, backend IR boundaries, and a few primitive-lowering details.
Blocking inconsistencies
-
computegraph-rsis described as AD-agnostic, but the core graph identity includesOpMode::Primal/OpMode::Linear { active_mask }.- This makes AD-specific semantics part of the supposedly AD-agnostic layer.
- Relevant docs:
docs/design-v2/README.mddocs/design-v2/computegraph-design.mddocs/design-v2/ad-architecture.md
-
The cache/identity story is not compatible with the current
InputKeydesign.GlobalValKey::Input(InputKey)participates in structural identity.differentiategenerates fresh tangent keys via uniqueDiffPassIdvalues.- The API docs still claim “same graph structure -> cache hit”, but without a stable input-key normalization rule this is not well-defined.
- Relevant docs:
docs/design-v2/computegraph-design.mddocs/design-v2/chainrules-design.mddocs/design-v2/tidu-design.mddocs/design-v2/tensor-api-pseudocode.md
-
The public AD API is underspecified relative to the lower-level transform contract.
differentiatecreates tangentInputKeys inside the returned fragment.- The user-facing API shows
y.jvp(&x, &t_x)but does not explain howt_xis bound to those generated tangent keys. grad()/ VJP seed semantics for non-scalar outputs are also left implicit.- Relevant docs:
docs/design-v2/ad-architecture.mddocs/design-v2/tidu-design.mddocs/design-v2/tensor-api-pseudocode.md
-
Backend IR boundaries are inconsistent.
- The overview says all three standard backends accept StableHLO.
- Later sections say faer/custom GPU interpret
CompiledProgramdirectly. - Another section describes faer as a StableHLO interpreter.
- This changes crate boundaries, lowering responsibilities, and cache layering.
- Relevant doc:
docs/design-v2/backend-architecture.md
-
Duplowering is inconsistent with the stated 1:1 StableHLO lowering rule.Dupis defined as a multi-output primitive.- The backend doc maps it to
stablehlo.broadcast_in_dim, which is not a multi-output duplication op. - Relevant docs:
docs/design-v2/primitive-catalog.mddocs/design-v2/backend-architecture.md
Important but secondary inconsistencies
-
The roadmap phases do not match the stated einsum decomposition requirements.
einsumdecomposition depends onReshape,Transpose, andBroadcastInDim.- The backend roadmap still places several of those in Phase 2 while claiming Phase 1 einsum support.
- Relevant docs:
docs/design-v2/tensor-design.mddocs/design-v2/backend-architecture.md
-
The linalg-to-StableHLO boundary is not fixed consistently.
backend-architecture.mdtreats linalg ops broadly ascustom_call.stablehlo-primitives.mdlists a directcholeskyop.jax-stablehlo-primitives-needed-for-tenferro.mduses a different decomposition story again.- Relevant docs:
docs/design-v2/backend-architecture.mddocs/design-v2/stablehlo-primitives.mddocs/design-v2/jax-stablehlo-primitives-needed-for-tenferro.md
Minor doc issues
PrimitiveOp'sInputKey: ADKeybound is documented inconsistently.Tensor.stridesuses bothVec<isize>andVec<usize>across docs.- The SVD example in
tensor-api-pseudocode.mdusesdiag(&s)even thoughtensor-design.mdexplicitly argues for hyper-edge reconstruction usingsdirectly. computegraph-design.mdhas a stray code fence in theGraphOpsection.
Suggested resolution
Before implementation, pick and document one coherent answer for each of the following:
- Is
OpModepart ofcomputegraph-rs, or does AD-specific mode metadata live abovecomputegraph-rs? - What is the canonical cache key for compiled programs?
- How are user-facing
TracedTensorinputs mapped onto stableInputKeys? - What is the exact IR pipeline for faer, custom GPU, and XLA?
- Is
Dupa real persistent primitive, or only a transform-time/internal construct? - Which linalg ops lower directly to StableHLO ops, and which always lower to
custom_call?
Acceptance criteria
- Update the architecture docs so they tell one consistent story about graph identity, AD layering, caching, and backend lowering.
- Make the public API examples consistent with the lower-level transform contracts.
- Reconcile the primitive catalog, backend architecture, and StableHLO planning docs.