Skip to content

Releases: inclusionAI/dInfer

v0.2.0

21 Dec 07:11

Choose a tag to compare

This release delivers multiple major features:

  • support block diffusion LLM (LLaDA2-preview and LLaDA2);
  • add an optimized batch inference on block diffusion LLM;
  • support long sequence generation;
  • support native CUDA graph capture and management;
  • add SGLang as a backend for block diffusion LLM inference;
  • support FP8 quant for LLaDA2-preview and LLaDA2;
  • add an experimental feature to support the integration of dInfer and SGLang;
  • lm_eval support two benchmarks (gsm8k and mbpp) on LLaDA2-preview and LLaDA2