SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Jan 13, 2025 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
A script to convert floating-point CNN models into generalized low-precision ShiftCNN representation
Low Precision(quantized) Yolov5
Code for DNN feature map compression paper
CUDA/HIP header-only library to use vector and low-precision floating-point types (16 bit, 8 bit) in GPU code
LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity
Add a description, image, and links to the low-precision topic page so that developers can more easily learn about it.
To associate your repository with the low-precision topic, visit your repo's landing page and select "manage topics."