triton-kernels

Star

Here are 20 public repositories matching this topic...

linkedin / Liger-Kernel

Star

Efficient Triton Kernels for LLM Training

triton llama hacktoberfest mistral finetuning llms llm-training llama3 phi3 gemma2 triton-kernels

Updated Apr 3, 2026
Python

flagos-ai / FlagGems

Star

FlagGems is an operator library for large language models implemented in the Triton Language.

pytorch triton triton-kernels

Updated Apr 5, 2026
Python

harleyszhang / llm_note

Star

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

cuda-programming transformer-models kv-cache llm vllm llm-inference triton-kernels

Updated Apr 2, 2026
Python

harleyszhang / lite_llama

Star

A light llama-like llm inference framework based on the triton kernel.

python3 attention llama llm llm-inference llama3 llava-llama3 triton-kernels qwen2-5

Updated Jan 5, 2026
Python

NX-AI / mlstm_kernels

Star

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

deep-learning rnn llm xlstm triton-kernels

Updated Mar 27, 2026
Jupyter Notebook

stackav-oss / conch

Star

A "standard library" of Triton kernels.

amd cuda inference nvidia rocm triton-lang vllm bitsandbytes triton-kernels

Updated Oct 2, 2025
Python

ddickmann / vllm-factory

Star

Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment

retrieval encoder inference embeddings ner serving rag colbert ml-infra-deployments vllm gliner multimodal-rag ai-infrastructure colpali triton-kernels vllm-plugins

Updated Apr 5, 2026
Python

WithNucleusAI / mHC-triton

Star

Manifold-Constrained Hyper-Connections with fused Triton kernels for efficient training

deepseek triton-kernels hyper-connections

Updated Mar 10, 2026
Python

kyolebu / triton-misadventures

Star

Educational resource demonstrating common GPU programming pitfalls and solutions using Triton kernels.

gpu-acceleration triton-lang triton-kernels

Updated Feb 26, 2026
Jupyter Notebook

NeuroBrix / neurobrix

Star

Universal AI Runtime — Execute any model on any hardware

ai pytorch inference-engine aten triton-kernels

Updated Apr 3, 2026
Python

xmc-aalto / elmo

Star

Official Code for the paper ELMO : Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces (in ICML 2025)

chunking multi-label-classification extreme-classification low-precision-training large-output-space float8 triton-kernels gradient-fusion

Updated Feb 27, 2026
Python

KernelHeim – development ground of custom Triton and CUDA kernel functions designed to optimize and accelerate machine learning workloads on NVIDIA GPUs. Inspired by the mythical stronghold of the gods, KernelHeim is a forge where high-performance kernels are crafted to unlock the full potential of the hardware.

cuda-kernels parallel-programming triton-kernels

Updated Feb 13, 2026
Python

ayoussf / TritonTorch

Star

A container of various PyTorch neural network modules written in Triton.

deep-learning cuda torch pytorch openai triton triton-lang triton-kernels

Updated Mar 31, 2026
Python

kirsten-1 / hilda-kernel

Star

High-performance Triton kernel library for LLM training with 12 fused operators (AttnRes, RMSNorm, RoPE, CrossEntropy, GRPO, JSD, FusedLinear, etc.) — up to 24x faster than PyTorch with 78% memory savings, outperforming Liger-Kernel on RTX 5090

triton triton-kernels triton-inference triton-train