Releases: inclusionAI/dInfer
Releases · inclusionAI/dInfer
v0.2.0
This release delivers multiple major features:
- support block diffusion LLM (LLaDA2-preview and LLaDA2);
- add an optimized batch inference on block diffusion LLM;
- support long sequence generation;
- support native CUDA graph capture and management;
- add SGLang as a backend for block diffusion LLM inference;
- support FP8 quant for LLaDA2-preview and LLaDA2;
- add an experimental feature to support the integration of dInfer and SGLang;
- lm_eval support two benchmarks (gsm8k and mbpp) on LLaDA2-preview and LLaDA2