Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaLomo optimizer step method #132

Open
winglian opened this issue Oct 30, 2023 · 3 comments
Open

AdaLomo optimizer step method #132

winglian opened this issue Oct 30, 2023 · 3 comments

Comments

@winglian
Copy link

winglian commented Oct 30, 2023

Hi @KaiLv69

thanks for the writeup and implementation for AdaLomo! It looks like it is missing the step method which torch needs to use this in other frameworks. Can you help with this please?

@KaiLv69
Copy link
Collaborator

KaiLv69 commented Oct 31, 2023

Hi, thanks for your interests :)

AdaLomo (and LOMO) fused the backward and step process into one single process called fused backward to save memory for gradient. So there is no step method in AdaLomo. We have done special processing for this in the trainer. https://github.com/OpenLMLab/collie/blob/db76a99758ddecb3be48ab87e04643f7d7932ac4/collie/controller/trainer.py#L456

@winglian
Copy link
Author

winglian commented Nov 2, 2023

Thanks @KaiLv69 . Would you be able to share the LRs you used for the adalomo and adamw experiments in the paper? I think I might have the LR off by a large factor as adalomo converges much slower for me than adamw when using the same LR for both.

@KaiLv69
Copy link
Collaborator

KaiLv69 commented Nov 3, 2023

1e-3 or 5e-4 might be proper for adalomo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants