0.44.0: New AdEMAMix optimizer, Embeddings quantization, and more!
New optimizer: AdEMAMix
The AdEMAMix optimizer is a modification to AdamW which proposes tracking two EMAs to better leverage past gradients. This allows for faster convergence with less training data and improved resistance to forgetting.
We've implemented 8bit and paged variations: AdEMAMix
, AdEMAMix8bit
, PagedAdEMAMix
, and PagedAdEMAMix8bit
. These can be used with a similar API to existing optimizers.
import bitsandbytes as bnb
optimizer = bnb.optim.PagedAdEMAMix8bit(
model.parameters(),
lr=1e-4,
betas=(0.9, 0.999, 0.9999),
alpha=5.0,
eps=1e-8,
weight_decay=1e-2,
alpha=5.0,
)
8-bit Optimizers Update
The block size for all 8-bit optimizers has been reduced from 2048 to 256 in this release. This is a change from the original implementation proposed in the paper which improves accuracy.
CUDA Graphs support
A fix to enable CUDA Graphs capture of kernel functions was made in #1330. This allows for performance improvements with inference frameworks like vLLM. Thanks @jeejeelee!
Quantization for Embeddings
The trend of LLMs to use larger vocabularies continues. The embeddings can take up a significant portion of a quantized model's footprint. We now have an implementation of Embedding4bit
and Embedding8bit
thanks to @galqiwi!
Example usage:
import torch
import torch.nn as nn
from bitsandbytes.nn import Embedding4bit
fp16_module = nn.Embedding(128, 64)
quantized_module = Embedding4bit(128, 64)
quantized_module.load_state_dict(fp16_module.state_dict())
quantized_module = quantized_module.to(0)
Continuous Builds
We are now building binary wheels for each change on main
. These builds can be used to preview upcoming changes.
What's Changed
- Embedding4bit and Embedding8bit implementation by @galqiwi in #1292
- Bugfix: Load correct nocublaslt library variant when BNB_CUDA_VERSION override is set by @matthewdouglas in #1318
- Enable certain CUDA kernels to accept specified cuda stream by @jeejeelee in #1330
- Initial support for ppc64le by @mgiessing in #1316
- Cuda source cleanup , refactor and fixes by @abhilash1910 in #1328
- Update for VS2022 17.11 compatibility with CUDA < 12.4 by @matthewdouglas in #1341
- Bump the minor-patch group with 3 updates by @dependabot in #1362
- Update matplotlib requirement from ~=3.9.1 to ~=3.9.2 in the major group by @dependabot in #1361
- docs: add internal reference to multi-backend guide by @Titus-von-Koeller in #1352
- Add
move_to_device
kwarg to the optimizer'sload_state_dict
by @koute in #1344 - Add AdEMAMix optimizer by @matthewdouglas in #1360
- Change 8bit optimizer blocksize 2048->256; additional bf16 support by @matthewdouglas in #1365
New Contributors
- @jeejeelee made their first contribution in #1330
- @mgiessing made their first contribution in #1316
- @abhilash1910 made their first contribution in #1328
- @koute made their first contribution in #1344
Full Changelog: 0.43.3...v0.44.0