xlite-dev
Pinned Loading
Repositories
- Awesome-Diffusion-Inference Public
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉
xlite-dev/Awesome-Diffusion-Inference’s past year of commit activity - SageAttention Public Forked from thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
xlite-dev/SageAttention’s past year of commit activity - ffpa-attn-mma Public
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
xlite-dev/ffpa-attn-mma’s past year of commit activity - CUDA-Learn-Notes Public
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
xlite-dev/CUDA-Learn-Notes’s past year of commit activity - lite.ai.toolkit Public
🛠 A lite C++ toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉
xlite-dev/lite.ai.toolkit’s past year of commit activity - Awesome-LLM-Inference Public
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉
xlite-dev/Awesome-LLM-Inference’s past year of commit activity - statistic-learning-R-note Public
📒200-page PDF Notes for "Statistical Learning Methods-Li Hang", detailed explanations of various math formulas, implemented in R.🎉
xlite-dev/statistic-learning-R-note’s past year of commit activity