Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. lite.ai.toolkit lite.ai.toolkit Public

    🛠 A lite C++ toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉

    C++ 4k 737

  2. Awesome-LLM-Inference Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

    3.7k 261

  3. CUDA-Learn-Notes CUDA-Learn-Notes Public

    📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    Cuda 3k 309

  4. statistic-learning-R-note statistic-learning-R-note Public

    📒200-page PDF Notes for "Statistical Learning Methods-Li Hang", detailed explanations of various math formulas, implemented in R.🎉

    441 55

  5. torchlm torchlm Public

    💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

    Python 255 25

  6. ffpa-attn-mma ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

    Cuda 154 6

Repositories

Showing 10 of 19 repositories
  • Awesome-Diffusion-Inference Public

    📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

    xlite-dev/Awesome-Diffusion-Inference’s past year of commit activity
    198 GPL-3.0 13 0 0 Updated Mar 23, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    xlite-dev/SageAttention’s past year of commit activity
    Cuda 0 Apache-2.0 77 0 0 Updated Mar 23, 2025
  • ffpa-attn-mma Public

    📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

    xlite-dev/ffpa-attn-mma’s past year of commit activity
    Cuda 154 GPL-3.0 6 4 0 Updated Mar 23, 2025
  • flashinfer Public Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    xlite-dev/flashinfer’s past year of commit activity
    Cuda 0 Apache-2.0 260 0 0 Updated Mar 23, 2025
  • CUDA-Learn-Notes Public

    📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    xlite-dev/CUDA-Learn-Notes’s past year of commit activity
    Cuda 2,976 GPL-3.0 309 6 0 Updated Mar 22, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉

    xlite-dev/lite.ai.toolkit’s past year of commit activity
    C++ 3,991 GPL-3.0 737 0 0 Updated Mar 5, 2025
  • Awesome-LLM-Inference Public

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

    xlite-dev/Awesome-LLM-Inference’s past year of commit activity
    3,708 GPL-3.0 261 0 0 Updated Mar 4, 2025
  • hgemm-mma Public

    ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

    xlite-dev/hgemm-mma’s past year of commit activity
    Cuda 62 GPL-3.0 2 0 0 Updated Mar 4, 2025
  • statistic-learning-R-note Public

    📒200-page PDF Notes for "Statistical Learning Methods-Li Hang", detailed explanations of various math formulas, implemented in R.🎉

    xlite-dev/statistic-learning-R-note’s past year of commit activity
    441 GPL-3.0 55 2 0 Updated Feb 7, 2025
  • torchlm Public

    💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

    xlite-dev/torchlm’s past year of commit activity
    Python 255 MIT 25 14 0 Updated Feb 7, 2025

Top languages

Loading…

Most used topics

Loading…