Partial Head Quantization

Preface

This is a repo for an undergraduate thesis by Jacob Yang at University of British Columbia supervised by Professor Prashant Nair and a PhD student Muhammad Abdullah Adnan. The thesis in pdf is in the root folder ./CPEN499 Jacob's thesis.pdf.

How to set up environment

I encountered some config problem when setting up smoothquant environment. Here is how I solve those problems.

CUDA driver 12.3

CUDA Toolkit 11.6 Tried 12.3, 11.3, 11.8 cuda toolkit cannot be easily downgraded. just reinstall the system

Anaconda

Clone Smoothquant

Install smoothquant following the readme

Use pytorch 1.2.1 and cuda 11.6 Install torch-int following the readme

Install cutlass check out to feature/2.10

Some issue fixes

Convert the demo jupyter notebook smoothquant_opt_real_int8_demo.ipynb into python script and then run it in the virtual environment with smoothquant and torch-int installed.

How to generate act scales

According to Issue 60 in the smoothquant repo, generate activation scales with lambda dataset “ dataset = load_dataset(‘lambda’, split = ‘validation[:1000]’) ” in smoothquant/calibration.py get_act_scales

Disable the dataset_path for the generate_act_scales.py and run the default setting

Change the path to the newly generated activation scales in test_quant.py

Usage

To run the test, run

python ./examples/test_quant.py

There are an argument to modify in the examples/test_quant.py. For example, how many less significant heads to quantize.

Also, you need to add the head matrices to one of the outputs of the attention layers for the python code to run. Add the line

attn_weights_reshaped = attn_output

In between the line

attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim)

and

attn_output = attn_output.transpose(1, 2)

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
act_scales		act_scales
assets		assets
examples		examples
figures		figures
smoothquant		smoothquant
.gitattributes		.gitattributes
.gitignore		.gitignore
CPEN499 Jacob's thesis.pdf		CPEN499 Jacob's thesis.pdf
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Partial Head Quantization

Preface

How to set up environment

How to generate act scales

Usage

About

Releases

Packages

Contributors 4

Languages

License

V3rdant-byte/partial-head-quantization

Folders and files

Latest commit

History

Repository files navigation

Partial Head Quantization

Preface

How to set up environment

How to generate act scales

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages