This is a repo for an undergraduate thesis by Jacob Yang at University of British Columbia supervised by Professor Prashant Nair and a PhD student Muhammad Abdullah Adnan. The thesis in pdf is in the root folder ./CPEN499 Jacob's thesis.pdf.
I encountered some config problem when setting up smoothquant environment. Here is how I solve those problems.
CUDA Toolkit 11.6 Tried 12.3, 11.3, 11.8 cuda toolkit cannot be easily downgraded. just reinstall the system
Clone Smoothquant
Install smoothquant following the readme
Use pytorch 1.2.1 and cuda 11.6 Install torch-int following the readme
Install cutlass check out to feature/2.10
Some issue fixes
Convert the demo jupyter notebook smoothquant_opt_real_int8_demo.ipynb into python script and then run it in the virtual environment with smoothquant and torch-int installed.
According to Issue 60 in the smoothquant repo, generate activation scales with lambda dataset “ dataset = load_dataset(‘lambda’, split = ‘validation[:1000]’) ” in smoothquant/calibration.py get_act_scales
Disable the dataset_path for the generate_act_scales.py and run the default setting
Change the path to the newly generated activation scales in test_quant.py
To run the test, run
python ./examples/test_quant.py
There are an argument to modify in the examples/test_quant.py. For example, how many less significant heads to quantize.
Also, you need to add the head matrices to one of the outputs of the attention layers for the python code to run. Add the line
attn_weights_reshaped = attn_output
In between the line
attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim)
and
attn_output = attn_output.transpose(1, 2)