支持packing算法，提高训练速度 by JasonCZH4 · Pull Request #1 · wxhcore/bumblecore

JasonCZH4 · 2026-02-07T14:50:56Z

No description provided.

Removed commented-out SFTDataset class and its methods.

wxhcore · 2026-02-08T10:24:57Z

@JasonCZH4 Thank you very much for submitting a PR to this project! I think adding the sequence packing feature is a great optimization that can significantly improve GPU utilization and training efficiency. After reviewing the newly added code, I have the following suggestions:

Several utility functions have been added in dataset.py; could we move these utility functions into a new file for unified management?
I notice that only SFTDataset has been modified. Considering potential future support for other dataset types, I suggest using inheritance or the strategy pattern for extensibility. Alternatively, if you have a better approach, you could also refer to the implementation in supervised.py from LLaMA-Factory.
Actually, your current code doesn’t work correctly: in base_trainer.py, the loss computation explicitly passes input_ids and attention_mask, but your code doesn’t pass position_ids to the model, so it may need to be changed to pass the batch using **batch.
I see you’ve added two new parameters, so the corresponding YAML config and argparse setup likely need to be updated to ensure the parameters are passed correctly.
Since I haven’t integrated CI on GitHub yet, I’ll need you to run the existing tests and add unit or functional tests for the new feature to ensure everything works as expected.

Finally, thank you again for contributing to this project!

JasonCZH4 · 2026-02-08T10:31:54Z

Thanks for reviewing! It is my first time to submit PR. There are still many issues with the current code, and I am working hard to fix them in the coming time. In fact, I am currently conducting many tests. Once everything is OK, I will let you know. Thanks again!

wxhcore · 2026-02-08T10:38:01Z

Thanks for reviewing! It is my first time to submit PR. There are still many issues with the current code, and I am working hard to fix them in the coming time. In fact, I am currently conducting many tests. Once everything is OK, I will let you know. Thanks again!

Thank you very much for your efforts! This is also BumbleCore's first PR, so I attach great importance to it. Wish you all the best!

wxhcore

Thank you for your contribution! We still have a few issues to address. Additionally, have you compared the training performance of the same model and training configuration with and without sequence packing, to ensure that samples are properly isolated?

wxhcore · 2026-02-13T15:52:12Z

src/bumblecore/cli/arg_parser.py

        action="store_true",
        default=cfg.get("enable_gradient_checkpointing", False),
    )
+    parser.add_argument("--packing", type=bool, default=cfg.get("packing", True))


When type=bool is used, it doesn't work as expected because passing any non-empty string will be interpreted as True.

wxhcore · 2026-02-13T16:26:38Z

src/bumblecore/data_processing/datasets.py

-def _is_dist():
-    return dist.is_available() and dist.is_initialized() and dist.get_world_size() > 1
-
 class SFTDataset(Dataset):


Here are three suggestions for the sequence packing feature:

Encapsulate the logic into a PackingMixin that can be optionally inherited by SFTDataset, PretrainDataset, etc.

If it’s only needed for SFT, create a new class that inherits from SFTDataset and overrides the relevant methods.

Alternatively, you could introduce a new PackingSFTDataset that inherits from both PackingMixin and SFTDataset—this would be a clean and elegant way to compose the functionality without modifying existing code.

JasonCZH4 · 2026-02-14T02:17:29Z

Additionally, have you compared the training performance of the same model and training configuration with and without sequence packing, to ensure that samples are properly isolated?

Thanks for the review! I have test the performance on Qwen2.5-0.5B-Instruct using the following configuration:

dataset_path: Magpie-Align/Magpie-Pro-300K-Filtered (randomly select 10K to save time)        
cutoff_len: 8192

# Training
num_epochs: 3.0
learning_rate: 2e-5                             
weight_decay: 0.01
lr_scheduler_type: "cosine"                            
enable_gradient_checkpointing: true                   
warmup_ratio: 0.1
packing: true
packing_num_proc: 4

# Batch Size
train_micro_batch_size_per_gpu: 1          
gradient_accumulation_steps: 8             

# Precision
train_model_precision: "bf16"                   

# DeepSpeed
deepspeed_config_path: "./configs/deepspeed/ds_z3_config.json"  

average_tokens_across_devices: true

I validate the performance using lm-eval and have the following results.

Compared to not using packing, there is a slight improvement in model performance and a significant reduction in time when using packing.

Now I will turn to implementing the inheritance of packing features. Thanks for reviewing!

JasonCZH4 and others added 2 commits February 7, 2026 14:47

支持packing算法，提高训练速度

6f54f60

Remove commented-out SFTDataset class

5ea95d3

Removed commented-out SFTDataset class and its methods.

wxhcore and others added 4 commits February 13, 2026 11:33

提升代码健壮性，HfDeepSpeedConfig包含了zero.Init的功能，去除重叠部分

470021b

修复原来packing的bug，加入test_packing

17d7cb6

添加test_packing运行代码，忽略png

0485a61

修改packing相关yaml配置文件

1da4cea

wxhcore reviewed Feb 13, 2026

View reviewed changes

wxhcore self-assigned this Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

支持packing算法，提高训练速度#1

支持packing算法，提高训练速度#1
JasonCZH4 wants to merge 6 commits intowxhcore:mainfrom
JasonCZH4:main

JasonCZH4 commented Feb 7, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

JasonCZH4 commented Feb 8, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

wxhcore left a comment

Uh oh!

wxhcore Feb 13, 2026

Uh oh!

wxhcore Feb 13, 2026

Uh oh!

JasonCZH4 commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JasonCZH4 commented Feb 7, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

JasonCZH4 commented Feb 8, 2026

Uh oh!

wxhcore commented Feb 8, 2026

Uh oh!

wxhcore left a comment

Choose a reason for hiding this comment

Uh oh!

wxhcore Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

wxhcore Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

JasonCZH4 commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants