Skip to content

[New Op] Implement Tensor Core Accelerated Hadamard Transform #160

@Renkeyiiiii

Description

@Renkeyiiiii

Operator Description

The current state-of-the-art Fast Walsh-Hadamard Transform (FWHT) implementations utilize Tensor Cores on GPUs for significant performance improvements, as described in this paper This issue focuses on implementing a TileLang Tensor Core-accelerated Hadamard transform on GPUs. The operator should mirror existing SOTA methods but be implemented within the TileLang framework and interface.

Implementation Plan

1. Kernel Implementation (L1)

  • Kernel: Implement the TileLang kernel for the FWHT operator in top/kernels/<Hadamard>/.

2. Op Definition (L2)

  • Interface: Define the torch.ops interface for the FWHT operator in top/ops/<Hadamard>.py.
    • Provide a clear and efficient API for users to call the operator within their TileLang-based code.
    • Support FP16 and BF16 precision as part of the interface for optimization on modern GPUs.
  • Unit Tests: Implement unit tests for correctness in tests/test_<Hadamard>.py.
    • FP16: Ensure the output is close to the reference values, within an error margin of (1e-3).
    • BF16: Ensure the output is close to the reference values, within an error margin of (1.6e-2).
    • Compare the output with PyTorch's FWHT implementation for verification.
  • Benchmarks: Implement benchmarking scripts for performance in benchmarks/benchmark_<op_name>.py.
    • Latency: Measure the time taken to compute the FWHT using the TileLang operator.
    • TFLOPS: Report throughput in tera-floating-point operations per second.
    • DRAM Bandwidth: Measure the data transfer rates between GPU memory and the processor to assess the memory bottleneck.

3. Benchmark Results

  • Report the performance of the TileLang FWHT operator compared to existing SOTA implementations utilizing Tensor Cores
  • Provide performance improvements in terms of latency, throughput (TFLOPS), and memory bandwidth

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions