Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教一个问题,chatglm2在用lora微调时,不添加attention mask也可以么? #21

Open
annw0922 opened this issue Jun 30, 2023 · 2 comments

Comments

@annw0922
Copy link

annw0922 commented Jun 30, 2023

`def data_collator(features: list) -> dict:

len_ids = [len(feature["input_ids"]) for feature in features]
longest = max(len_ids)
input_ids = []
labels_list = []
for ids_l, feature in sorted(zip(len_ids, features), key=lambda x: -x[0]):
    ids = feature["input_ids"]
    seq_len = feature["seq_len"]
    labels = (
        [-100] * (seq_len - 1) + ids[(seq_len - 1) :] + [-100] * (longest - ids_l)
    )
    ids = ids + [tokenizer.pad_token_id] * (longest - ids_l)
    _ids = torch.LongTensor(ids)
    labels_list.append(torch.LongTensor(labels))
    input_ids.append(_ids)
input_ids = torch.stack(input_ids)
labels = torch.stack(labels_list)
return {
    "input_ids": input_ids,
    "labels": labels,
}

`
上述代码中的return没有attention mask和position id。看了很多GitHub的finetune代码,发现有些人会加上attention mask,有些人不会。 也运行了不加attention mask的代码,似乎没有问题,效果也不错。对这个问题十分疑惑,不加attention mask的话,在微调的时候不就可以直接看到label么?

@beyondguo
Copy link
Owner

ChatGLM, baichuan等模型的源码 (modeling_chatglm.py这种) 里面其实自己会构造应对 causal LM 的 attention mask,不用手动去构造了。

比如 https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L674

其他 GPT 类的模型都不用显示输入 attention mask,模型内部都会用各种办法来生成mask。

@nostalgiaer
Copy link

参照于Llama而言,一般传给forward的attention mask多数只是把哪些是padding,哪些不是padding指明出来,而一般模型内部会构造causal attention(俗称三角阵),这一步就是把后面的token mask掉。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants