low-precision

Here are 9 public repositories matching this topic...

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Jan 6, 2026
Python

Tiiiger / QPyTorch

Star

Low Precision Arithmetic Simulation in PyTorch

learning low-precision

Updated May 20, 2024
Python

gudovskiy / ShiftCNN

Star

A script to convert floating-point CNN models into generalized low-precision ShiftCNN representation

cnn dnn low-precision

Updated Jul 14, 2017
Python

sefaburakokcu / quantized-yolov5

Star

Low Precision(quantized) Yolov5

fpga yolov1 finn low-precision quantized-neural-networks pynq-z2 brevitas yolov5

Updated Mar 24, 2025
Python

KernelTuner / kernel_float

Star

CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development

performance cpp gpu cuda kernel-tuner hip vectorization floating-point half-precision mixed-precision low-precision bfloat16 header-only-library reduced-precision

Updated Dec 15, 2025
C++

graphcore-research / jax-scalify

Star

JAX Scalify: end-to-end scaled arithmetics

jax low-precision llm fp8

Updated Oct 30, 2024
Python

gudovskiy / fmap_compression

Star

Code for DNN feature map compression paper

compression caffe cnn dnn feature-map low-precision

Updated Nov 21, 2018
C++

AmanPriyanshu / LinearCosine

Sponsor

Star

LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity

nlp benchmarking machine-learning computer-vision deep-learning algorithms cpp optimization linear-algebra artificial-intelligence computation matrix-multiplication neural-networks cosine-similarity floating-point quantization energy-efficiency performance-optimization low-precision

Updated Oct 21, 2024
C++

abdulvahapmutlu / quantlab-8bit

Star

QuantLab-8bit is a reproducible benchmark of 8-bit quantization on compact vision backbones. It includes FP32 baselines, PTQ (dynamic & static), QAT, ONNX exports, parity checks, ORT CPU latency, and visual diagnostics.

benchmarking computer-vision deep-learning pytorch reproducibility quantization model-compression onnx gradcam low-precision edge-ai onnxruntime streamlit model-optimization quantization-aware-training post-training-quantization efficient-ai

Updated Sep 25, 2025
Python

Improve this page

Add a description, image, and links to the low-precision topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the low-precision topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low-precision

Here are 9 public repositories matching this topic...

intel / neural-compressor

Tiiiger / QPyTorch

gudovskiy / ShiftCNN

sefaburakokcu / quantized-yolov5

KernelTuner / kernel_float

graphcore-research / jax-scalify

gudovskiy / fmap_compression

AmanPriyanshu / LinearCosine

abdulvahapmutlu / quantlab-8bit

Improve this page

Add this topic to your repo