GitHub - Yeyke/HBLLM: [NeurIPS 2025 (spotlight)] HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs

HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs

Abstract

We introduce HBLLM, a wavelet-enhanced high-fidelity $1$-bit post-training quantization method for Large Language Models (LLMs). By leveraging Haar wavelet transforms to enhance expressive capacity through frequency decomposition, HBLLM significantly improves quantization fidelity while maintaining minimal overhead. This approach features two innovative structure-aware grouping strategies: (1) frequency-aware multi-parameter intra-row grouping and (2) $\ell_2$-norm-based saliency-driven column selection. For non-salient weights, a shared mean is employed across quantization groups within each frequency band to optimize storage efficiency. Experiments conducted on the OPT and LLaMA models demonstrate that HBLLM achieves state-of-the-art performance in $1$-bit quantization, attaining a perplexity of $6.71$ perplexity on LLaMA2-13B with an average weight storage of only $1.08$ bits.

Dependencies

torch: tested on v2.2.2+cu118
transformers: tested on v4.35.0 (the LLaMa integration currently requires a main install from source and sentencepiece)
datasets: tested on v2.14.6

All binarization processes and experiments were run on a single 80GB NVIDIA A100. However, all the process can also be conducted on a single 24GB NVIDIA 3090 Ti when the model's parameter is under 70B.

LLMs Binarization

Binarization for OPT families

Row-wise Haar transform (row-hbraq)

python3 run.py opt-1.3b /home/models/opt-1.3b c4 row-hbraq --blocksize 128 --salient_metric l2 --group_partition row

or

Column-wise Haar transform (col-hbraq)

python3 run.py opt-1.3b /home/models/opt-1.3b c4 col-hbraq --blocksize 128 --salient_metric l2 --group_partition row

Binarization for LLaMA families

python3 run.py llama2-7b /home/models/llama2-7b c4 row-hbraq --blocksize 128 --salient_metric l2 --group_partition row

use shared_mean strategy

or

python3 run.py llama2-7b /home/models/llama2-7b c4 row-hbraq --blocksize 128 --salient_metric l2 --group_partition row --share_mean

Results

Related Project

GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers

PB-LLM: Partially Binarized Large Language Models

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Citation

If you find HBLLM is helpful to your work, please kindly cite this paper:

@article{chen2025hbllm,
  title={HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs},
  author={Ningning Chen, Weicai Ye, Ying Jiang},
  journal={arXiv preprint arXiv:2512.00862},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
utils		utils
LICENSE		LICENSE
README.md		README.md
bigptq.py		bigptq.py
datautils.py		datautils.py
eval_ppl_utils.py		eval_ppl_utils.py
haarbinary.py		haarbinary.py
modelutils.py		modelutils.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Abstract

Dependencies

LLMs Binarization

Binarization for OPT families

Row-wise Haar transform (row-hbraq)

Column-wise Haar transform (col-hbraq)

Binarization for LLaMA families

use shared_mean strategy

Results

Related Project

Citation

About

Uh oh!

Releases

Packages

Languages

License

Yeyke/HBLLM

Folders and files

Latest commit

History

Repository files navigation

Abstract

Dependencies

LLMs Binarization

Binarization for OPT families

Row-wise Haar transform (row-hbraq)

Column-wise Haar transform (col-hbraq)

Binarization for LLaMA families

use shared_mean strategy

Results

Related Project

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages