EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

🔧 GPTQModel Support

EoRA is now seamlessly integrated into GPTQModel(HERE), Check here for detailed instructions on running EoRA with GPTQModel.

🛠 Installation

# clone repo
git clone https://github.com/ModelCloud/GPTQModel.git && cd GPTQModel

# pip: compile and install
# You can install optional modules like autoround, ipex, vllm, sglang, bitblas, and ipex.
# Example: pip install -v --no-build-isolation .[vllm,sglang,bitblas,ipex,auto_round]
pip install -v . --no-build-isolation

⚡ Quick Start

Step 1: Quantize the model with GPTQModel

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig

model_id = "meta-llama/Llama-3.2-3B"
quant_path = "Llama-3.2-3B-gptqmodel-4bit"

calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(1024))["text"]

quant_config = QuantizeConfig(bits=4, group_size=128)

model = GPTQModel.load(model_id, quant_config)

# increase `batch_size` to match gpu/vram specs to speed up quantization
model.quantize(calibration_dataset, batch_size=1)

model.save(quant_path)

Step 2: Generate EoRA given the quantized model

from gptqmodel.adapter.adapter import Lora
from gptqmodel import GPTQModel, QuantizeConfig

eora = Lora(
  # for eora generation, path is adapter save path; for load, it is loading path
  path=f"{quant_path}/eora_rank16", 
  rank=16,
)

# provide a previously gptq quantized model path
GPTQModel.adapter.generate(
  adapter=eora,
  model_id_or_path=model_id,
  quantized_model_id_or_path=quant_path,
  calibration_dataset=calibration_dataset,
  calibration_dataset_concat_size=0,
  auto_gc=False)

# post-eora inference
model = GPTQModel.load(
  model_id_or_path=quant_path,
  adapter=eora
)

Step 3: Evaluate the accuracy of the original quantized model vs EoRA

Evaluting the original quantized model

python GPTQModel/examples/eora/evaluation.py --quantized_model quant_path

Evaluting EoRA

python GPTQModel/examples/eora/evaluation.py --quantized_model quant_path \
    --eora_save_path {quant_path}/eora_rank32 \
    --eora_rank 16

Reproducing Paper Results

You can find full reproduction instructions in the EoRA directory.

Shih-Yang Liu*, Maksim Khadkevich, Nai Chit FUNG, Charbel Sakr, Chao-Han Huck Yang,Chien-Yi Wang, Saurav Muralidharan, Hongxu Yin, Kwang-Ting Cheng, Jan Kautz, Yu-Chiang Frank Wang, Pavlo Molchanov, Min-Hung Chen
(*Work done during the internship at NVIDIA Research)

[Paper] [BibTeX]

EoRA projects the compression error into the eigenspace of input activations and performs low-rank approximation for compensating the compressed model.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

💥 News 💥

[24.02.2025] 🔥🔥 EoRA has been integrated into GPTQModel HERE!!
[13.06.2025] 🔥🔥 Release the code for reproducing the paper's results!!

Star History

Contact

Shih-Yang Liu: shihyangl@nvidia.com or sliuau@connect.ust.hk

Citation

If you find EoRA useful, please consider giving a star and citation:

@article{liu2024eora,
  title={Eora: Training-free compensation for compressed llm with eigenspace low-rank approximation},
  author={Liu, Shih-Yang and Khadkevich, Maksim and Fung, Nai Chit and Sakr, Charbel and Yang, Chao-Han Huck and Wang, Chien-Yi and Muralidharan, Saurav and Yin, Hongxu and Cheng, Kwang-Ting and Kautz, Jan and others},
  journal={arXiv preprint arXiv:2410.21271},
  year={2024}
}

Licenses

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
EoRA		EoRA
imgs		imgs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

🔧 GPTQModel Support

EoRA is now seamlessly integrated into GPTQModel(HERE), Check here for detailed instructions on running EoRA with GPTQModel.

🛠 Installation

⚡ Quick Start

Step 1: Quantize the model with GPTQModel

Step 2: Generate EoRA given the quantized model

Step 3: Evaluate the accuracy of the original quantized model vs EoRA

Evaluting the original quantized model

Evaluting EoRA

Reproducing Paper Results

💥 News 💥

Star History

Contact

Citation

Licenses

About

Uh oh!

Releases

Packages

Languages

License

NVlabs/EoRA

Folders and files

Latest commit

History

Repository files navigation

EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

🔧 GPTQModel Support

EoRA is now seamlessly integrated into GPTQModel(HERE), Check here for detailed instructions on running EoRA with GPTQModel.

🛠 Installation

⚡ Quick Start

Step 1: Quantize the model with GPTQModel

Step 2: Generate EoRA given the quantized model

Step 3: Evaluate the accuracy of the original quantized model vs EoRA

Evaluting the original quantized model

Evaluting EoRA

Reproducing Paper Results

💥 News 💥

Star History

Contact

Citation

Licenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages