[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
benchmark deployment tool evaluation pruning quantization post-training-quantization awq large-language-models llm vllm smoothquant mixtral internlm2 lvlm llama3 omniquant quarot lightllm spinquant
-
Updated
Dec 23, 2024 - Python