This is an unofficial implementation of ExactOBC
algorithm introduced in the paper Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning packed into a module. The official implementation can be found here.
- Install using
ssh
:
pip install git+ssh://git@github.com/tinyvolt/optimal-brain-compression.git
- Install using
https
:
pip install git+https://github.com/tinyvolt/optimal-brain-compression.git
- The official implementation focuses on running various experiments, completeness and reproducibility of results. Its purpose is not reusability, readability or writing good software.
- This implementation focuses on reusability, readability and good programming practices. It only implements one algorithm -
ExactOBS
for quantization - while leaving out other implementations like unstructured pruning and N:M pruning. - This implementation focuses on modularity and not completeness. To be more precise, I did not implement the logic to calculate the Hessian by adding hooks and updating the (unnormalized) covariance matrix for each batch of data. As long as you have a matrix and a Hessian, you can use this module to quantize the matrix based on the Hessian properties. The onus of calculating the Hessian and storing the quantized matrix is on the user as of now.
Its usage is super simple:
from optimal_brain_compression import exact_obc
quantized_matrix = exact_obc(matrix, hessian, n_bits=4)
A complete working example is shown below:
import torch
from optimal_brain_compression import exact_obc
torch.manual_seed(0)
n_rows = 24
n_cols = 5
matrix = torch.randn(n_rows, n_cols)
xs = torch.randn(100, n_cols)
hessian = (xs.t() @ xs).div(xs.shape[0] - 1)
quantized_matrix = exact_obc(matrix, hessian, n_bits=4)
# you can use a smaller batch size if needed
quantized_matrix = exact_obc(matrix, hessian, n_bits=4, batch_size=8)
.
├── LICENSE
├── README.md
├── optimal_brain_compression
│ ├── __init__.py
│ ├── _checks.py
│ ├── _types.py
│ ├── _utils.py
│ └── exact_obc.py
└── setup.py
I think the paper has a bunch of interesting ideas.
- I got interested in the problem described in
Lemma 1
(equation 4) in the paper. I wrote an article with a generalized form of this problem with the proofs and code. - Proof for equations 3 and 7 in the paper:
If
where
Let's say you want to set the element at index
We want to minimize
Using this value in
where
This gives us:
- Setting
$c = 0$ gives equation 3. - Setting
$c = \text{quant}(w_p)$ gives equation 7.
Finally, using this value of