-
Notifications
You must be signed in to change notification settings - Fork 4
PSGD-Kron-Pro(crustes) optimizer implementation #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSGD-Kron-Pro(crustes) optimizer implementation #60
Conversation
…contraction Signed-off-by: mikail <mkhona@nvidia.com>
Signed-off-by: mikail <mkhona@nvidia.com>
Signed-off-by: mikail <mkhona@nvidia.com>
Signed-off-by: mikail <mkhona@nvidia.com>
/ok to test 780e3d7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Comments are mostly style, although must fix.
Signed-off-by: mikail <mkhona@nvidia.com>
Signed-off-by: mikail <mkhona@nvidia.com>
Signed-off-by: mikail <mkhona@nvidia.com>
Signed-off-by: mikail <mkhona@nvidia.com>
@https://github.com/lixilinx does this look good to you? |
Thanks, @mkhona-nvidia for the PSGD code. It looks good and well organized to me! I once verified the correctness of psgd_kron_contractions by comparison with einsum. In the norm_lower_bound_spd, we will set the default subspace dim to 32 for float32 (based on my test). |
Signed-off-by: mikail <mkhona@nvidia.com>
… fp32 Signed-off-by: mikail <mkhona@nvidia.com>
Signed-off-by: mikail <mkhona@nvidia.com>
The momentum dampening has also been changed to: Dampened momentum calculation:dampened_momentum = exp_avg + (
damping_noise_scale + torch.finfo(exp_avg.dtype).eps * exp_avg.abs()
) * torch.randn_like(exp_avg) with a |
/ok to test f9f12bd |
Signed-off-by: mikail <mkhona@nvidia.com>
/ok to test 54220e2 |
This builds on previous PRs for PSGD's helper functions to make the PSGD-Kron-Pro optimizer