Merge pull request #47 from flash-algo/add-dot-pytorch-kernel

LoserCheems · web-flow · commit 3ac79a2735e6 · 2025-12-01T11:31:44.000+08:00
[PERFORMANCE OPTIMIZATION] add dot pytorch kernel
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ The following common BLAS kernels have been implemented in multiple frameworks.
 | [swap](./docs/swap.md) | swap vectors | $x \leftrightarrow y$ | $0$ | $4n$ | [✅](./kernel_course/python_ops/swap.py) | [✅](./kernel_course/pytorch_ops/swap.py) | [✅](./kernel_course/triton_ops/swap.py) | ❌ | [✅](./tests/test_swap.py) |
 | [scal](./docs/scal.md) | scale vector | $y = \alpha y$ | $n$ | $2n$ | [✅](./kernel_course/python_ops/scal.py) | [✅](./kernel_course/pytorch_ops/scal.py) | [✅](./kernel_course/triton_ops/scal.py) | ❌ | [✅](./tests/test_scal.py) |
 | [axpby](./docs/axpby.md) | update vector| $y = \alpha x + \beta y$ | $3n$ | $3n$ | [✅](./kernel_course/python_ops/axpby.py) | [✅](./kernel_course/pytorch_ops/axpby.py) | [✅](./kernel_course/triton_ops/axpby.py) | ❌ | [✅](./tests/test_axpby.py) |
-| [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [✅](./kernel_course/python_ops/dot.py) | ❌ | ❌ | ❌ | ❌ |
+| [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [✅](./kernel_course/python_ops/dot.py) | [✅](./kernel_course/pytorch_ops/dot.py) | ❌ | ❌ | ❌ |
 | gemv | general matrix-vector multiply | $y = \alpha A x + \beta y$ | $2mn$ | $mn + n + 2m$ | ❌ | ❌ | ❌ | ❌ | ❌ |
 | geru | general rank-1 update | $A = A + \alpha x y^\top$ | $2mn$ | $2mn + m + n$ | ❌ | ❌ | ❌ | ❌ | ❌ |
 | gemm | general matrix-matrix multiply | $C = \alpha A B + \beta C$ | $2mnk$ | $mk + nk + 2mn$ | ❌ | ❌ | ❌ | ❌ | ❌ |
diff --git a/kernel_course/pytorch_ops/dot.py b/kernel_course/pytorch_ops/dot.py
@@ -0,0 +1,21 @@
+import torch
+
+
+def dot(
+    x: torch.Tensor,
+    y: torch.Tensor,
+) -> torch.Tensor:
+    """
+    Computes the dot product of two tensors using PyTorch operations.
+
+    Args:
+        x (torch.Tensor): First tensor.
+        y (torch.Tensor): Second tensor.
+
+    Returns:
+        torch.Tensor: The dot product of `x` and `y`.
+    """
+
+    z = torch.sum(torch.mul(x, y))
+
+    return z