AdaLomo optimizer step method #132

winglian · 2023-10-30T17:08:32Z

thanks for the writeup and implementation for AdaLomo! It looks like it is missing the step method which torch needs to use this in other frameworks. Can you help with this please?

The text was updated successfully, but these errors were encountered:

KaiLv69 · 2023-10-31T02:31:18Z

Hi, thanks for your interests :)

AdaLomo (and LOMO) fused the backward and step process into one single process called fused backward to save memory for gradient. So there is no step method in AdaLomo. We have done special processing for this in the trainer. https://github.com/OpenLMLab/collie/blob/db76a99758ddecb3be48ab87e04643f7d7932ac4/collie/controller/trainer.py#L456

winglian · 2023-11-02T17:42:03Z

Thanks @KaiLv69 . Would you be able to share the LRs you used for the adalomo and adamw experiments in the paper? I think I might have the LR off by a large factor as adalomo converges much slower for me than adamw when using the same LR for both.

KaiLv69 · 2023-11-03T02:39:07Z

1e-3 or 5e-4 might be proper for adalomo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaLomo optimizer step method #132

AdaLomo optimizer step method #132

winglian commented Oct 30, 2023 •

edited

Loading

KaiLv69 commented Oct 31, 2023

winglian commented Nov 2, 2023

KaiLv69 commented Nov 3, 2023

AdaLomo optimizer step method #132

AdaLomo optimizer step method #132

Comments

winglian commented Oct 30, 2023 • edited Loading

KaiLv69 commented Oct 31, 2023

winglian commented Nov 2, 2023

KaiLv69 commented Nov 3, 2023

winglian commented Oct 30, 2023 •

edited

Loading