Releases · kozistr/pytorch_optimizer

17 May 10:36

kozistr

v3.6.0

9753eda

pytorch-optimizer v3.6.0 Latest

Latest

Change Log

Feature

Implement Fira optimizer. (#376)
- Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Implement RACS and Alice optimizers. (#376)
- Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
Implement VSGD optimizer. (#377, #378)
- Variational Stochastic Gradient Descent for Deep Neural Networks
Enable training with complex parameters. (#370, #380)
- will raise NoComplexParameterError for unsupported optimizers, due to its design or not-yet-implemented.
Support maximize parameter. (#370, #380)
- maximize: maximize the objective with respect to the params, instead of minimizing.
Implement copy_stochastic() method. (#381)

Update

Support 2D< Tensor for RACS and Alice optimizers. (#380)
Remove the auxiliary variants from the default parameters of the optimizers and change the name of the state and parameter. (#380)
- use_gc, adanorm, cautious, stable_adamw, and adam_debias will be affected.
- You can still use these variants by passing the parameters to **kwargs.
- Notably, in case of adanorm variant, you need to pass adanorm (and adanorm_r for r option) parameter(s) to use this variant, and the name of the state will be changed from exp_avg_norm to exp_avg_adanorm.
Refactor reset() to init_group() method in the BaseOptimizer class. (#380)
Refactor SAM optimizer family. (#380)
Gather AdamP, SGDP things into pytorch_optimizer.optimizer.adamp.*. (#381)
- pytorch_optimizer.optimizer.sgdp.SGDP to pytorch_optimizer.optimizer.adamp.SGDP
- pytorch_optimizer.optimizer.util.projection to pytorch_optimizer.optimizer.adamp.projection
- pytorch_optimizer.optimizer.util.cosine_similarity_by_view to pytorch_optimizer.optimizer.adamp.cosine_similarity_by_view
Remove channel_view() and layer_view() from pytorch_optimizer.optimizer.util. (#381)

Fix

Fix shape mismatch issues in the Galore projection for reverse_std, right, and full projection types. (#376)

Assets 2

26 Apr 17:01

kozistr

v3.5.1

84b926c

pytorch-optimizer v3.5.1

Change Log

Feature

Implement ScionLight optimizer. (#369)

Update

Update SCION optimizer based on the official implementation. (#369)

Fix

Correct the learning rate ratio in Muon optimizer properly. (#371, #372, #373)

Assets 2

16 Mar 07:03

kozistr

v3.5.0

6397d56

pytorch-optimizer v3.5.0

Change Log

Feature

Support StableSPAM optimizer. (#358, #359)
- How to Train in 4-Bit More Stably than 16-Bit Adam
Support ScheduleFreeWrapper. (#334, #360)
Implement AdaGC optimizer. (#364, #366)
- Improving Training Stability for Large Language Model Pretraining
Implement Simplified-Ademamix optimizer. (#364, #366)
- Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Support Ackley function for testing optimization algorithms.

Update

Update Muon optimizer. (#355, #356)
- support decoupled weight decay.
- adjust default hyperparameters the same as the original implementation.
- support adjusted lr from the Moonlight. you can use it by setting use_adjusted_lr=True.
Tune the performance of the coupled Newton iteration method by 5% increase. (#360)
Update SCION optimizer. (#361)
- add scale parameter.
- update get_lmo_direction.

Fix

bias_correction2 in ScheduleFreeRAdam optimizer. (#354)
potential bug in SPAM optimizer. (#365)
initialize the z state within the step() of the ScheduleFreeWrapper. (#363, #366)

Assets 2

22 Feb 06:08

kozistr

v3.4.2

c09d18b

pytorch-optimizer v3.4.2

Change Log

Feature

Implement SCION optimizer. (#348, #352)
- Training Deep Learning Models with Norm-Constrained LMOs

Update

Update ScheduleFreeSGD, AdamW, RAdam optimizers with the latest. (#351, #353)
Remove use_palm variant in ScheduleFree optimizer due to instability. (#353)
Ranger25 optimizer. (#353)

Fix

Remove weight decouple parameter in ScheduleFree optimizers. (#351, #353)

Docs

Fix AliG optimizer visualization. (#350)

Contributions

thanks to @AidinHamedi, @hatonosuke

Contributors

hatonosuke and AidinHamedi

Assets 2

14 Feb 11:57

kozistr

v3.4.1

00fbae0

pytorch-optimizer v3.4.1

Change Log

Feature

Support GCSAM optimizer. (#343, #344)
- Gradient Centralized Sharpness Aware Minimization
- you can use it from SAM optimizer by setting use_gc=True.
Support LookSAM optimizer. (#343, #344)
- Towards Efficient and Scalable Sharpness-Aware Minimization

Update

Support alternative precision training for Shampoo optimizer. (#339)
Add more features to and tune Ranger25 optimizer. (#340)
- AGC + Lookahead variants
- change default beta1, beta2 to 0.95 and 0.98 respectively
Skip adding Lookahead wrapper in case of Ranger* optimizers, which already have it in create_optimizer(). (#340)
Improved optimizer visualization. (#345)
Rename pytorch_optimizer.optimizer.gc to pytorch_optimizer.optimizer.gradient_centralization to avoid possible conflict with Python built-in function gc. (#349)

Bug

Fix to update exp_avg_sq after calculating the denominator in ADOPT optimizer. (#346, #347)

Docs

Update the visualizations. (#340)

Contributions

thanks to @AidinHamedi

Contributors

AidinHamedi

Assets 2

02 Feb 05:07

kozistr

v3.4.0

8da7b49

pytorch-optimizer v3.4.0

Change Log

Feature

Implement FOCUS optimizer. (#330, #331)
- First Order Concentrated Updating Scheme
Implement PSGD Kron optimizer. (#336, #337)
- preconditioned stochastic gradient descent w/ Kron pre-conditioner
Implement EXAdam optimizer. (#338, #339)
- The Power of Adaptive Cross-Moments

Update

Support OrthoGrad variant to Ranger25. (#332)
- Ranger25 optimizer is my experimental-crafted optimizer, which mixes lots of optimizer variants such as ADOPT + AdEMAMix + Cautious + StableAdamW + Adam-Atan2 + OrthoGrad.

Fix

Add the missing state property in OrthoGrad optimizer. (#326, #327)
Add the missing state_dict, and load_state_dict methods to TRAC and OrthoGrad optimizers. (#332)
Skip when the gradient is sparse in OrthoGrad optimizer. (#332)
Support alternative precision training in SOAP optimizer. (#333)
Store SOAP condition matrices as the dtype of their parameters. (#335)

Contributions

thanks to @Vectorrent, @kylevedder

Contributors

kylevedder and Vectorrent

Assets 2

19 Jan 06:31

kozistr

v3.3.4

55c3553

pytorch-optimizer v3.3.4

Change Log

Feature

Support OrthoGrad feature for create_optimizer(). (#324)
Enhanced flexibility for the optimizer parameter in Lookahead, TRAC, and OrthoGrad optimizers. (#324)
- Now supports both torch.optim.Optimizer instances and classes
- You can now use Lookahead optimizer in two ways.
  - Lookahead(AdamW(model.parameters(), lr=1e-3), k=5, alpha=0.5)
  - Lookahead(AdamW, k=5, alpha=0.5, params=model.parameters())
Implement SPAM optimizer. (#324)
- Spike-Aware Adam with Momentum Reset for Stable LLM Training
Implement TAM, and AdaTAM optimizers. (#325)
- Torque-Aware Momentum

Assets 2

13 Jan 16:07

kozistr

v3.3.3

5baa713

pytorch-optimizer v3.3.3

Change Log

Feature

Implement Grams optimizer. (#317, #318)
- Grams: Gradient Descent with Adaptive Momentum Scaling
Support stable_adamw variant for ADOPT and AdEMAMix optimizer. (#321)
- optimizer = ADOPT(model.parameters(), ..., stable_adamw=True)
Implement an experimental optimizer Ranger25 (not tested). (#321)
- mixing ADOPT + AdEMAMix + StableAdamW + Cautious + RAdam optimizers.
Implement OrthoGrad optimizer. (#321)
- Grokking at the Edge of Numerical Stability
Support Adam-Atan2 feature for Prodigy optimizer when eps is None. (#321)
- Scaling Exponents Across Parameterizations and Optimizers

Assets 2

21 Dec 10:38

kozistr

v3.3.2

8f538d4

pytorch-optimizer v3.3.2

Change Log

Feature

Implement SGDSaI optimizer. (#315, #316)
- No More Adam: Learning Rate Scaling at Initialization is All You Need

Bug

Clone exp_avg before calling apply_cautious not to mask exp_avg. (#316)

Assets 2

21 Dec 07:20

kozistr

v3.3.1

d16a368

pytorch-optimizer v3.3.1

Change Log

Feature

Support Cautious variant to AdaShift optimizer. (#310)
Save the state of the Lookahead optimizer too. (#310)
Implement APOLLO optimizer. (#311, #312)
- SGD-like Memory, AdamW-level Performance
Rename the Apollo (An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization) optimizer name to ApolloDQN not to overlap with the new optimizer name APOLLO. (#312)
Implement MARS optimizer. (#313, #314)
- Unleashing the Power of Variance Reduction for Training Large Models
Support Cautious variant to MARS optimizer. (#314)

Bug

Fix bias_correction in AdamG optimizer. (#305, #308)
Fix a potential bug when loading the state for Lookahead optimizer. (#306, #310)

Docs

Add more visualizations. (#310, #314)

Contributions

thanks to @Vectorrent

Contributors

Vectorrent

Assets 2

Releases: kozistr/pytorch_optimizer

pytorch-optimizer v3.6.0

Change Log

Feature

Update

Fix

Uh oh!

pytorch-optimizer v3.5.1

Change Log

Feature

Update

Fix

Uh oh!

pytorch-optimizer v3.5.0

Change Log

Feature

Update

Fix

Uh oh!

pytorch-optimizer v3.4.2

Change Log

Feature

Update

Fix

Docs

Contributions

Contributors

Uh oh!

pytorch-optimizer v3.4.1

Change Log

Feature

Update

Bug

Docs

Contributions

Contributors

Uh oh!

pytorch-optimizer v3.4.0

Change Log

Feature

Update

Fix

Contributions

Contributors

Uh oh!

pytorch-optimizer v3.3.4

Change Log

Feature

Uh oh!

pytorch-optimizer v3.3.3

Change Log

Feature

Uh oh!

pytorch-optimizer v3.3.2

Change Log

Feature

Bug

Uh oh!

pytorch-optimizer v3.3.1

Change Log

Feature

Bug

Docs

Contributions

Contributors

Uh oh!