Efficient Triton Kernels for LLM Training
-
Updated
Apr 3, 2026 - Python
Efficient Triton Kernels for LLM Training
FlagGems is an operator library for large language models implemented in the Triton Language.
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
A light llama-like llm inference framework based on the triton kernel.
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
A "standard library" of Triton kernels.
Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment
Manifold-Constrained Hyper-Connections with fused Triton kernels for efficient training
Educational resource demonstrating common GPU programming pitfalls and solutions using Triton kernels.
Universal AI Runtime — Execute any model on any hardware
Official Code for the paper ELMO : Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces (in ICML 2025)
KernelHeim – development ground of custom Triton and CUDA kernel functions designed to optimize and accelerate machine learning workloads on NVIDIA GPUs. Inspired by the mythical stronghold of the gods, KernelHeim is a forge where high-performance kernels are crafted to unlock the full potential of the hardware.
A container of various PyTorch neural network modules written in Triton.
High-performance Triton kernel library for LLM training with 12 fused operators (AttnRes, RMSNorm, RoPE, CrossEntropy, GRPO, JSD, FusedLinear, etc.) — up to 24x faster than PyTorch with 78% memory savings, outperforming Liger-Kernel on RTX 5090
Collection of Triton operators for transformer models.
Repository for learning Triton GPU programming
FlashAttention2 Analysis in Triton
💥 Optimize linear attention models with efficient Triton-based implementations in PyTorch, compatible across NVIDIA, AMD, and Intel platforms.
Yandex LLM Scaling Week 2025
A memory-efficient and CUDA-independent Triton implementation of Sparse Convolution, optimized for high-performance 3D Perception.
Add a description, image, and links to the triton-kernels topic page so that developers can more easily learn about it.
To associate your repository with the triton-kernels topic, visit your repo's landing page and select "manage topics."