New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于 adalomo 没有 loss_scaler 只有 loss_scale 的问题 #139

Open

HappyLynn opened this issue Nov 29, 2023 · 1 comment

HappyLynn commented Nov 29, 2023

这是如果 clip_grad_norm 不是 None 会有问题，所以对于 adalomo 是不需要 clip_grad_norm 吗？

Collaborator

KaiLv69 commented Nov 30, 2023

得益于grouped update norm, 在实验中我们观察到使用grad norm与否对adalomo的效果影响不大。并且使用grad norm会减少训练吞吐量，所以如果不是训练特别不稳定，adalomo不建议使用clip_grad_norm。

不过这里应该是个bug，adalomo没有loss_scaler这个属性，我们会之后修正这一点。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment