COMET: A New Memory-Efficient Deep Learning Training Framework Using Error-Bounded Lossy Compression
COMET is a modified version of the Caffe framework, enabling memory-efficient deep learning training through error-bounded lossy compression technology, SZ. We've primarily focused on modifying the Caffe [1] layer function to support SZ [2] for compressing activation data.
Note: This repository now is maintained at new account: qubilyan.
There are two methods for setting up and using COMET.
To simplify the use of COMET, we provide a docker image with the essential environment.
Assuming docker has been installed, run this command to pull our docker image from DockerHub:
docker pull jinsian/caffe
First, launch the docker image:
docker run -ti jinsian/caffe:COMET /bin/bash
Then, use the following command to start the training process with AlexNet on the Stanford Dogs dataset with COMET:
cd /opt/caffe
./build/tools/caffe train -solver ./models/bvlc_reference_caffenet/solver.prototxt
Use this command to clone the repo:
git clone https://github.com/qubilyan/Efficient-DL-training-COMET-VLDB22
Install SZ following instructions shown at https://github.com/szcompressor/SZ before building COMET.
[1] Yangqing Jia, et al. "Caffe: Convolutional architecture for fast feature embedding." In Proceedings of the 22nd ACM international conference on Multimedia, pp. 675-678. 2014.
[2] Tian, Jiannan, et al. "cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data." In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, pp. 3-15. 2020.