Skip to content

[New Op] Linear Attention in Gated DeltaNet #173

@superAngGao

Description

@superAngGao

Operator Description

The Gated DeltaNet (GDN)[1] is an emerging linear attention[2] operator that has been adopted in state-of-the-art large language models such as Qwen3-Next and Kimi Linear. By introducing a learnable gating mechanism into the original DeltaNet framework[3], GDN enables dynamic, input-dependent control over memory updates, effectively balancing retention and forgetting in long-context sequences. This enhancement preserves the core advantage of linear attention—computational and memory complexity that scales linearly with sequence length—while significantly improving numerical stability, retrieval accuracy, and adaptability to shifting contextual dependencies.

Implementation Plan

1. Kernel Implementation (L1)

  • Kernel: Implement TileLang kernel in top/kernels/LinearAttn/
  • Verification: Pass functional correctness tests

2. Op Definition (L2)

  • Interface: Define torch.ops interface in top/ops/gatedDeltaNet.py
  • Unit Tests: Implement tests/test_<op_name>.py (Compare vs PyTorch Ref)
    • FP16 (close: 1e-3)
    • BF16 (close: 1.6e-2)
  • Benchmarks: Implement benchmarks/benchmark_gatedDeltaNet.py
    • Latency
    • TFLOPS
    • DRAM Bandwidth

Reference

[1] S. Yang, J. Kautz, and A. Hatamizadeh, “Gated Delta Networks: Improving Mamba2 with Delta Rule,” arXiv preprint arXiv:2412.06464, 2024. DOI: 10.48550/arXiv.2412.06464.
[2] A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are RNNs: Fast autoregressive transformers with linear attention,” in Proc. 37th Int. Conf. Mach. Learn. (ICML), vol. 119, 2020, pp. 5156–5165.
[3] I. Schlag, K. Irie, and J. Schmidhuber, “Linear transformers are secretly fast weight programmers,” in Proc. 38th Int. Conf. Mach. Learn. (ICML), vol. 139, 2021, pp. 9355–9366.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions