Skip to content

Add STDE deferred items: const-generic diagonal, dense STDE, GPU Taylor#27

Merged
gvonness-apolitical merged 1 commit intomainfrom
feature/stde-deferred-items
Feb 25, 2026
Merged

Add STDE deferred items: const-generic diagonal, dense STDE, GPU Taylor#27
gvonness-apolitical merged 1 commit intomainfrom
feature/stde-deferred-items

Conversation

@gvonness-apolitical
Copy link
Contributor

Summary

  • Add diagonal_kth_order_const<ORDER> — stack-allocated const-generic variant of diagonal_kth_order for compile-time-known derivative order, avoiding TaylorDyn arena overhead
  • Add dense_stde_2nd — dense STDE for positive-definite 2nd-order operators using Cholesky-transformed Gaussian directions (tr(C · H) estimation)
  • Add GPU-accelerated 2nd-order Taylor forward propagation via new WGSL shader (taylor_forward_2nd.wgsl) and CUDA kernel (taylor_eval.cu), propagating (c0, c1, c2) triples through the tape
  • Integrate taylor_forward_2nd_batch into both WgpuContext and CudaContext
  • Add high-level GPU STDE wrappers: laplacian_gpu, hessian_diagonal_gpu, laplacian_with_control_gpu
  • Add comprehensive tests: const-generic diagonal, dense STDE, per-opcode GPU Taylor verification, end-to-end GPU STDE cross-validation against CPU
  • Add benchmarks for const-generic vs dynamic diagonal and GPU vs CPU STDE
  • Update documentation across algorithms.md, README.md, and module docs

Test plan

  • cargo test --features "stde,diffop,gpu-wgpu" — 640 tests pass, 0 failures
  • cargo build --features "stde,diffop,gpu-wgpu" — clean build
  • Const-generic diagonal matches dynamic variant for k=2,3,4
  • Dense STDE with L=I matches Hutchinson Laplacian
  • GPU Taylor c0/c1/c2 match CPU taylor_jet_2nd for all opcode categories
  • GPU laplacian_gpu matches CPU laplacian (tolerance ~1e-4)
  • GPU hessian_diagonal_gpu matches CPU hessian_diagonal (tolerance ~1e-4)

@gvonness-apolitical gvonness-apolitical force-pushed the feature/stde-deferred-items branch from 300a3a9 to a758cd4 Compare February 25, 2026 23:31
Implements the deferred items from the STDE plan:

- Const-generic diagonal_kth_order_const<ORDER> for stack-allocated
  fast path when derivative order is known at compile time
- Dense STDE for positive-definite 2nd-order operators (dense_stde_2nd)
  using Cholesky-transformed Gaussian directions
- GPU-accelerated 2nd-order Taylor forward propagation via new WGSL
  shader and CUDA kernel, propagating (c0, c1, c2) triples through
  the tape for batched STDE evaluation
- High-level GPU STDE wrappers: laplacian_gpu, hessian_diagonal_gpu,
  laplacian_with_control_gpu
- Comprehensive tests (const-generic, dense STDE, per-opcode GPU,
  end-to-end GPU STDE) and benchmarks
- Documentation updates across algorithms.md, README, and module docs
@gvonness-apolitical gvonness-apolitical force-pushed the feature/stde-deferred-items branch from a758cd4 to 54f98d7 Compare February 25, 2026 23:33
@gvonness-apolitical gvonness-apolitical merged commit cbfda05 into main Feb 25, 2026
6 checks passed
@gvonness-apolitical gvonness-apolitical deleted the feature/stde-deferred-items branch February 25, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant