Experiments with Binarized Neural Networks in Pytorch.
The code provides a clean implementation of Binned Neural Networks with a custom CUDA kernel for the forward pass. It incorporates the main ideas introduced in Binarized Neural Networks paper.
The only layer available at the moment is BinaryLinear
, which is a
binarized version of torch.nn.Linear
. The optimized forward pass kernel
is available via the use_xnor_kernel
argument.
The code requires CUDA 10.2+.
- Install the Python dependencies:
pip install -r requirements.txt
- Install the optimized forward pass CUDA kernel for the
BinaryLinear
:
cd cuda && pip install .
If this fails, you can try to explicitly specify the compiler you want to use via the CXX
environment variable:
CXX=g++ pip install .
experiments/mnist_mlp.py contains an example experiment with an MLP network on the MNIST dataset.
Benchmarks were run on Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
/ GeForce GTX 1650 Mobile
.
The custom CUDA XNOR kernel was compared to the cuBLAS kernel on the following problems:
- (8196, 8916) x (8196, 8916) matrix multiplication
- MLP ((4096, 4096, 4096) hidden units) inference on the MNIST test set (batch size = 100); first layer and softmax projection layers were not binarized
Each experiment was repeated 100 times with torch.utils.benchmark
.
Problem | cuBLAS | XNOR |
---|---|---|
Matrix Multiplication | 425.21 ms | 155.33 ms |
MLP on MNIST test | 772.96 ms | 690.84 ms |
The full report is available in the experiments
folder.
Benchmarks were created using experiments/benchmark.py.