Skip to content

Releases: flagos-ai/FlagScale

v1.0.0-alpha.0

05 Jan 04:07

Choose a tag to compare

  • Major updates: Refactored the codebase by moving hardware-specific (multi-chip) support into plugin repositories such as TransformerEngine-FL and vllm-plugin-FL. These plugins build on top of FlagOS, a unified open-source AI system software stack. We also re-initialized the Git commit history to reduce repository size.
  • Compatibility (legacy users): If you are using or upgrading from a version earlier than v1.0.0-alpha.0, please use the main-legacy branch. It will continue to receive critical bug fixes and minor updates for a period of time.

v0.9.0

30 Sep 09:36
396cfce

Choose a tag to compare

  • Training & Finetuning: Added LoRA for efficient finetuning, improved the autotuner for cross-chip heterogeneous training, and enabled distributed RWKV training.
  • Inference & Serving: Introduced DiffusionEngine for FLUX.1-dev, Qwen-Image, and Wan2.1-T2V, support multi-model automatic orchestration and dynamic scaling.
  • Embodied AI: Full lifecycle support for Robobrain, Robotics, and PI0, plus semantic retrieval for MCP-based skills for RoboOS.
  • Elastic & Fault Tolerance: Detect task status automatically (errors, hangs, etc.) and periodically record them.
  • Hardware & System: Broader chip support, upgraded patch mechanism with file-level diffs, and enhanced CICD for different chips.

v0.8.0

30 Apr 11:24
779a1e3

Choose a tag to compare

  • Introduced a new flexible and robust multi-backend mechanism and updated vendor adaptation methods.
  • Enabled heterogeneous prefill-decoding disaggregation across vendor chips within a single instance via FlagCX (beta).
  • Upgraded DeepSeek-v3 pre-training with the new Megatron-LM and added heterogeneous pre-training across different chips for MoE models like DeepSeek-v3.

v0.6.5

24 Feb 11:32
b8c4137

Choose a tag to compare

  • Added support for DeepSeek-V3 distributed pre-training (beta) and DeepSeek-V3/R1 serving across multiple chips.
  • Introduced an auto-tuning feature for serving and a new CLI feature for one-click deployment.
  • Enhanced the CI/CD system to support more chips and integrated the workflow of FlagRelease.

v0.6.0

06 Nov 09:49
3ae142f

Choose a tag to compare

  • Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.
  • Added comprehensive support for data processing and faster distributed training of LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.
  • Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.
  • Implemented the auto-tuning feature to simplify large-scale distributed training, making it more accessible for users with less expertise.
  • Enhanced the CI/CD system to facilitate more efficient unit testing across different backends and perform the loss check for the various parallel strategies.

v0.3

11 Apr 02:34

Choose a tag to compare

  • Accomplish the heterogeneous hybrid training of the Aquila2-70B-Expr model on a cluster utilizing a combination of NVIDIA and Iluvatar chips.
  • Provide the training of the Aquila2 series across a variety of AI chips from six distinct manufacturers.

v0.2

30 Nov 09:27

Choose a tag to compare

  • Provide the actually used training scheme for Aquila2-70B-Expr, including the parallel strategies, optimizations and hyper-parameter settings.
  • Support heterogeneous training on chips of different generations with the same architecture or compatible architectures, including NVIDIA GPUs and Iluvatar CoreX chips.
  • Support training on chinese domestic hardwares, including Iluvatar CoreX and Baidu KUNLUN chips.