Skip to content

design-v2: resolve architecture inconsistencies before implementation #19

@shinaoka

Description

@shinaoka

Summary

A cross-document review of docs/design-v2/ found several design inconsistencies that should be resolved before implementation proceeds.

The mathematical examples look internally consistent, but the architecture docs still disagree on crate boundaries, identity/caching rules, backend IR boundaries, and a few primitive-lowering details.

Blocking inconsistencies

  1. computegraph-rs is described as AD-agnostic, but the core graph identity includes OpMode::Primal / OpMode::Linear { active_mask }.

    • This makes AD-specific semantics part of the supposedly AD-agnostic layer.
    • Relevant docs:
      • docs/design-v2/README.md
      • docs/design-v2/computegraph-design.md
      • docs/design-v2/ad-architecture.md
  2. The cache/identity story is not compatible with the current InputKey design.

    • GlobalValKey::Input(InputKey) participates in structural identity.
    • differentiate generates fresh tangent keys via unique DiffPassId values.
    • The API docs still claim “same graph structure -> cache hit”, but without a stable input-key normalization rule this is not well-defined.
    • Relevant docs:
      • docs/design-v2/computegraph-design.md
      • docs/design-v2/chainrules-design.md
      • docs/design-v2/tidu-design.md
      • docs/design-v2/tensor-api-pseudocode.md
  3. The public AD API is underspecified relative to the lower-level transform contract.

    • differentiate creates tangent InputKeys inside the returned fragment.
    • The user-facing API shows y.jvp(&x, &t_x) but does not explain how t_x is bound to those generated tangent keys.
    • grad() / VJP seed semantics for non-scalar outputs are also left implicit.
    • Relevant docs:
      • docs/design-v2/ad-architecture.md
      • docs/design-v2/tidu-design.md
      • docs/design-v2/tensor-api-pseudocode.md
  4. Backend IR boundaries are inconsistent.

    • The overview says all three standard backends accept StableHLO.
    • Later sections say faer/custom GPU interpret CompiledProgram directly.
    • Another section describes faer as a StableHLO interpreter.
    • This changes crate boundaries, lowering responsibilities, and cache layering.
    • Relevant doc:
      • docs/design-v2/backend-architecture.md
  5. Dup lowering is inconsistent with the stated 1:1 StableHLO lowering rule.

    • Dup is defined as a multi-output primitive.
    • The backend doc maps it to stablehlo.broadcast_in_dim, which is not a multi-output duplication op.
    • Relevant docs:
      • docs/design-v2/primitive-catalog.md
      • docs/design-v2/backend-architecture.md

Important but secondary inconsistencies

  1. The roadmap phases do not match the stated einsum decomposition requirements.

    • einsum decomposition depends on Reshape, Transpose, and BroadcastInDim.
    • The backend roadmap still places several of those in Phase 2 while claiming Phase 1 einsum support.
    • Relevant docs:
      • docs/design-v2/tensor-design.md
      • docs/design-v2/backend-architecture.md
  2. The linalg-to-StableHLO boundary is not fixed consistently.

    • backend-architecture.md treats linalg ops broadly as custom_call.
    • stablehlo-primitives.md lists a direct cholesky op.
    • jax-stablehlo-primitives-needed-for-tenferro.md uses a different decomposition story again.
    • Relevant docs:
      • docs/design-v2/backend-architecture.md
      • docs/design-v2/stablehlo-primitives.md
      • docs/design-v2/jax-stablehlo-primitives-needed-for-tenferro.md

Minor doc issues

  • PrimitiveOp's InputKey: ADKey bound is documented inconsistently.
  • Tensor.strides uses both Vec<isize> and Vec<usize> across docs.
  • The SVD example in tensor-api-pseudocode.md uses diag(&s) even though tensor-design.md explicitly argues for hyper-edge reconstruction using s directly.
  • computegraph-design.md has a stray code fence in the GraphOp section.

Suggested resolution

Before implementation, pick and document one coherent answer for each of the following:

  1. Is OpMode part of computegraph-rs, or does AD-specific mode metadata live above computegraph-rs?
  2. What is the canonical cache key for compiled programs?
  3. How are user-facing TracedTensor inputs mapped onto stable InputKeys?
  4. What is the exact IR pipeline for faer, custom GPU, and XLA?
  5. Is Dup a real persistent primitive, or only a transform-time/internal construct?
  6. Which linalg ops lower directly to StableHLO ops, and which always lower to custom_call?

Acceptance criteria

  • Update the architecture docs so they tell one consistent story about graph identity, AD layering, caching, and backend lowering.
  • Make the public API examples consistent with the lower-level transform contracts.
  • Reconcile the primitive catalog, backend architecture, and StableHLO planning docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions