Releases: kozistr/pytorch_optimizer
Releases · kozistr/pytorch_optimizer
pytorch-optimizer v3.6.0
Change Log
Feature
- Implement
Fira
optimizer. (#376) - Implement
RACS
andAlice
optimizers. (#376) - Implement
VSGD
optimizer. (#377, #378) - Enable training with complex parameters. (#370, #380)
- will raise
NoComplexParameterError
for unsupported optimizers, due to its design or not-yet-implemented.
- will raise
- Support
maximize
parameter. (#370, #380)maximize
: maximize the objective with respect to the params, instead of minimizing.
- Implement
copy_stochastic()
method. (#381)
Update
- Support 2D< Tensor for
RACS
andAlice
optimizers. (#380) - Remove the auxiliary variants from the default parameters of the optimizers and change the name of the state and parameter. (#380)
use_gc
,adanorm
,cautious
,stable_adamw
, andadam_debias
will be affected.- You can still use these variants by passing the parameters to
**kwargs
. - Notably, in case of
adanorm
variant, you need to passadanorm
(andadanorm_r
forr
option) parameter(s) to use this variant, and the name of the state will be changed fromexp_avg_norm
toexp_avg_adanorm
.
- Refactor
reset()
toinit_group()
method in theBaseOptimizer
class. (#380) - Refactor
SAM
optimizer family. (#380) - Gather
AdamP
,SGDP
things intopytorch_optimizer.optimizer.adamp.*
. (#381)pytorch_optimizer.optimizer.sgdp.SGDP
topytorch_optimizer.optimizer.adamp.SGDP
pytorch_optimizer.optimizer.util.projection
topytorch_optimizer.optimizer.adamp.projection
pytorch_optimizer.optimizer.util.cosine_similarity_by_view
topytorch_optimizer.optimizer.adamp.cosine_similarity_by_view
- Remove
channel_view()
andlayer_view()
frompytorch_optimizer.optimizer.util
. (#381)
Fix
- Fix shape mismatch issues in the Galore projection for
reverse_std
,right
, andfull
projection types. (#376)
pytorch-optimizer v3.5.1
pytorch-optimizer v3.5.0
Change Log
Feature
- Support
StableSPAM
optimizer. (#358, #359) - Support
ScheduleFreeWrapper
. (#334, #360) - Implement
AdaGC
optimizer. (#364, #366) - Implement
Simplified-Ademamix
optimizer. (#364, #366) - Support
Ackley
function for testing optimization algorithms.
Update
- Update Muon optimizer. (#355, #356)
- support decoupled weight decay.
- adjust default hyperparameters the same as the original implementation.
- support adjusted lr from the Moonlight. you can use it by setting
use_adjusted_lr=True
.
- Tune the performance of the coupled Newton iteration method by 5% increase. (#360)
- Update
SCION
optimizer. (#361)- add
scale
parameter. - update
get_lmo_direction
.
- add
Fix
pytorch-optimizer v3.4.2
Change Log
Feature
Update
- Update ScheduleFreeSGD, AdamW, RAdam optimizers with the latest. (#351, #353)
- Remove
use_palm
variant in ScheduleFree optimizer due to instability. (#353) - Ranger25 optimizer. (#353)
Fix
Docs
- Fix
AliG
optimizer visualization. (#350)
Contributions
thanks to @AidinHamedi, @hatonosuke
pytorch-optimizer v3.4.1
Change Log
Feature
- Support
GCSAM
optimizer. (#343, #344)- Gradient Centralized Sharpness Aware Minimization
- you can use it from
SAM
optimizer by settinguse_gc=True
.
- Support
LookSAM
optimizer. (#343, #344)
Update
- Support alternative precision training for
Shampoo
optimizer. (#339) - Add more features to and tune
Ranger25
optimizer. (#340)AGC
+Lookahead
variants- change default beta1, beta2 to 0.95 and 0.98 respectively
- Skip adding
Lookahead
wrapper in case ofRanger*
optimizers, which already have it increate_optimizer()
. (#340) - Improved optimizer visualization. (#345)
- Rename
pytorch_optimizer.optimizer.gc
topytorch_optimizer.optimizer.gradient_centralization
to avoid possible conflict with Python built-in functiongc
. (#349)
Bug
Docs
- Update the visualizations. (#340)
Contributions
thanks to @AidinHamedi
pytorch-optimizer v3.4.0
Change Log
Feature
- Implement
FOCUS
optimizer. (#330, #331) - Implement
PSGD Kron
optimizer. (#336, #337) - Implement
EXAdam
optimizer. (#338, #339)
Update
- Support
OrthoGrad
variant toRanger25
. (#332)Ranger25
optimizer is my experimental-crafted optimizer, which mixes lots of optimizer variants such asADOPT
+AdEMAMix
+Cautious
+StableAdamW
+Adam-Atan2
+OrthoGrad
.
Fix
- Add the missing
state
property inOrthoGrad
optimizer. (#326, #327) - Add the missing
state_dict
, andload_state_dict
methods toTRAC
andOrthoGrad
optimizers. (#332) - Skip when the gradient is sparse in
OrthoGrad
optimizer. (#332) - Support alternative precision training in
SOAP
optimizer. (#333) - Store SOAP condition matrices as the dtype of their parameters. (#335)
Contributions
thanks to @Vectorrent, @kylevedder
pytorch-optimizer v3.3.4
Change Log
Feature
- Support
OrthoGrad
feature forcreate_optimizer()
. (#324) - Enhanced flexibility for the
optimizer
parameter inLookahead
,TRAC
, andOrthoGrad
optimizers. (#324)- Now supports both torch.optim.Optimizer instances and classes
- You can now use
Lookahead
optimizer in two ways.Lookahead(AdamW(model.parameters(), lr=1e-3), k=5, alpha=0.5)
Lookahead(AdamW, k=5, alpha=0.5, params=model.parameters())
- Implement
SPAM
optimizer. (#324) - Implement
TAM
, andAdaTAM
optimizers. (#325)
pytorch-optimizer v3.3.3
Change Log
Feature
- Implement
Grams
optimizer. (#317, #318) - Support
stable_adamw
variant forADOPT
andAdEMAMix
optimizer. (#321)optimizer = ADOPT(model.parameters(), ..., stable_adamw=True)
- Implement an experimental optimizer
Ranger25
(not tested). (#321)- mixing
ADOPT + AdEMAMix + StableAdamW + Cautious + RAdam
optimizers.
- mixing
- Implement
OrthoGrad
optimizer. (#321) - Support
Adam-Atan2
feature forProdigy
optimizer wheneps
is None. (#321)
pytorch-optimizer v3.3.2
pytorch-optimizer v3.3.1
Change Log
Feature
- Support
Cautious
variant toAdaShift
optimizer. (#310) - Save the state of the
Lookahead
optimizer too. (#310) - Implement
APOLLO
optimizer. (#311, #312) - Rename the
Apollo
(An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
) optimizer name toApolloDQN
not to overlap with the new optimizer nameAPOLLO
. (#312) - Implement
MARS
optimizer. (#313, #314) - Support
Cautious
variant toMARS
optimizer. (#314)
Bug
- Fix
bias_correction
inAdamG
optimizer. (#305, #308) - Fix a potential bug when loading the state for
Lookahead
optimizer. (#306, #310)
Docs
Contributions
thanks to @Vectorrent