HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs
We introduce HBLLM, a wavelet-enhanced high-fidelity
torch: tested on v2.2.2+cu118transformers: tested on v4.35.0 (the LLaMa integration currently requires a main install from source andsentencepiece)datasets: tested on v2.14.6
All binarization processes and experiments were run on a single 80GB NVIDIA A100. However, all the process can also be conducted on a single 24GB NVIDIA 3090 Ti when the model's parameter is under 70B.
python3 run.py opt-1.3b /home/models/opt-1.3b c4 row-hbraq --blocksize 128 --salient_metric l2 --group_partition row
or
python3 run.py opt-1.3b /home/models/opt-1.3b c4 col-hbraq --blocksize 128 --salient_metric l2 --group_partition row
python3 run.py llama2-7b /home/models/llama2-7b c4 row-hbraq --blocksize 128 --salient_metric l2 --group_partition row
or
python3 run.py llama2-7b /home/models/llama2-7b c4 row-hbraq --blocksize 128 --salient_metric l2 --group_partition row --share_mean
GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers
PB-LLM: Partially Binarized Large Language Models
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
If you find HBLLM is helpful to your work, please kindly cite this paper:
@article{chen2025hbllm,
title={HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs},
author={Ningning Chen, Weicai Ye, Ying Jiang},
journal={arXiv preprint arXiv:2512.00862},
year={2025}
}