Skip to content

wrong loss_mask processing in _process_chat func of pretrain/qwen3_data.py? #38

@cookieminions

Description

@cookieminions

Hi, thanks for the great work and contribution to the open-source community!

While reviewing the data processing logic in qwen3_dataset.py, I noticed a potential issue in the _process_chat function used for SFT data.

  • Specifically, when _get_assistant_mask is called, the start_pattern and end_pattern arguments are explicitly passed as [151644, 872, 198] and [151645, 198, 151645], which override the default values defined in _get_assistant_mask (start_pattern=[151644, 77091, 198] and end_pattern=[151645, 198]). codes about loss_mask

    inputs["loss_mask"] = self._get_assistant_mask(
        input_ids,
        start_pattern=[self.im_start_token_id, 872, 198],  # <|im_start|>assistant
        end_pattern=[self.im_end_token_id, 198, self.im_end_token_id]  # <|im_end|>
    )
  • Although the comment indicates that the intended start_pattern corresponds to "<|im_start|>assistant\n", using the tokenizer of qwen3-1.7b/8b shows that token 872 corresponds to "user" rather than "assistant". As a result, the effective start_pattern becomes "<|im_start|>user\n" instead of "<|im_start|>assistant\n".

  • This leads to the loss_mask being set to 1 for user-related content. Since loss_mask is later used in train_qwen3.py to construct the training labels, user tokens are also included in the loss computation.

Could you please clarify whether this behavior is intended? What is the rationale for setting start_pattern to [151644, 872, 198] and end_pattern to [151645, 198, 151645]? Additionally, were the experimental results reported in the paper obtained using this same configuration?

I'd really appreciate your clarifications! Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions