We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这是如果 clip_grad_norm 不是 None 会有问题,所以对于 adalomo 是不需要 clip_grad_norm 吗?
The text was updated successfully, but these errors were encountered:
得益于grouped update norm, 在实验中我们观察到使用grad norm与否对adalomo的效果影响不大。并且使用grad norm会减少训练吞吐量,所以如果不是训练特别不稳定,adalomo不建议使用clip_grad_norm。
不过这里应该是个bug,adalomo没有loss_scaler这个属性,我们会之后修正这一点。
Sorry, something went wrong.
No branches or pull requests
这是如果 clip_grad_norm 不是 None 会有问题,所以对于 adalomo 是不需要 clip_grad_norm 吗?
The text was updated successfully, but these errors were encountered: