Skip to content
#

matmul

Here are 34 public repositories matching this topic...

This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.

  • Updated Nov 13, 2024
  • Cuda

Bit-exact, cross-hardware deterministic matrix multiplication using Q16.16 fixed-point arithmetic and SHA-256 verification. Provides identical AI inference results across NVIDIA GPUs, CPUs, and OSs. Essential for ZK-ML, Fintech, and AI Safety. Includes a PyTorch drop-in replacement, API and GPT-2 demo.

  • Updated Mar 31, 2026
  • Python

Improve this page

Add a description, image, and links to the matmul topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the matmul topic, visit your repo's landing page and select "manage topics."

Learn more