Releases · inclusionAI/dInfer · GitHub

21 Dec 07:11

zheng-da

v0.2.0 Latest

Latest

This release delivers multiple major features:

support block diffusion LLM (LLaDA2-preview and LLaDA2);
add an optimized batch inference on block diffusion LLM;
support long sequence generation;
support native CUDA graph capture and management;
add SGLang as a backend for block diffusion LLM inference;
support FP8 quant for LLaDA2-preview and LLaDA2;
add an experimental feature to support the integration of dInfer and SGLang;
lm_eval support two benchmarks (gsm8k and mbpp) on LLaDA2-preview and LLaDA2

Assets 2