Releases · flagos-ai/FlagScale · GitHub

05 Jan 04:07

aoyulong

v1.0.0-alpha.0 Latest

Latest

Major updates: Refactored the codebase by moving hardware-specific (multi-chip) support into plugin repositories such as TransformerEngine-FL and vllm-plugin-FL. These plugins build on top of FlagOS, a unified open-source AI system software stack. We also re-initialized the Git commit history to reduce repository size.
Compatibility (legacy users): If you are using or upgrading from a version earlier than v1.0.0-alpha.0, please use the main-legacy branch. It will continue to receive critical bug fixes and minor updates for a period of time.

Assets 2

30 Sep 09:36

aoyulong

v0.9.0

Training & Finetuning: Added LoRA for efficient finetuning, improved the autotuner for cross-chip heterogeneous training, and enabled distributed RWKV training.
Inference & Serving: Introduced DiffusionEngine for FLUX.1-dev, Qwen-Image, and Wan2.1-T2V, support multi-model automatic orchestration and dynamic scaling.
Embodied AI: Full lifecycle support for Robobrain, Robotics, and PI0, plus semantic retrieval for MCP-based skills for RoboOS.
Elastic & Fault Tolerance: Detect task status automatically (errors, hangs, etc.) and periodically record them.
Hardware & System: Broader chip support, upgraded patch mechanism with file-level diffs, and enhanced CICD for different chips.

Assets 2

30 Apr 11:24

aoyulong

v0.8.0

Introduced a new flexible and robust multi-backend mechanism and updated vendor adaptation methods.
Enabled heterogeneous prefill-decoding disaggregation across vendor chips within a single instance via FlagCX (beta).
Upgraded DeepSeek-v3 pre-training with the new Megatron-LM and added heterogeneous pre-training across different chips for MoE models like DeepSeek-v3.

Assets 2

24 Feb 11:32

aoyulong

v0.6.5

Added support for DeepSeek-V3 distributed pre-training (beta) and DeepSeek-V3/R1 serving across multiple chips.
Introduced an auto-tuning feature for serving and a new CLI feature for one-click deployment.
Enhanced the CI/CD system to support more chips and integrated the workflow of FlagRelease.

Assets 2

06 Nov 09:49

aoyulong

v0.6.0

Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.
Added comprehensive support for data processing and faster distributed training of LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.
Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.
Implemented the auto-tuning feature to simplify large-scale distributed training, making it more accessible for users with less expertise.
Enhanced the CI/CD system to facilitate more efficient unit testing across different backends and perform the loss check for the various parallel strategies.

Assets 2

11 Apr 02:34

aoyulong

v0.3

Accomplish the heterogeneous hybrid training of the Aquila2-70B-Expr model on a cluster utilizing a combination of NVIDIA and Iluvatar chips.
Provide the training of the Aquila2 series across a variety of AI chips from six distinct manufacturers.

Assets 2

30 Nov 09:27

aoyulong

v0.2

Provide the actually used training scheme for Aquila2-70B-Expr, including the parallel strategies, optimizations and hyper-parameter settings.
Support heterogeneous training on chips of different generations with the same architecture or compatible architectures, including NVIDIA GPUs and Iluvatar CoreX chips.
Support training on chinese domestic hardwares, including Iluvatar CoreX and Baidu KUNLUN chips.

Assets 2