PyTorch native quantization and sparsity for training and inference
training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8
-
Updated
Dec 25, 2024 - Python