Skip to content

Commit

Permalink
Update doc (#499)
Browse files Browse the repository at this point in the history
  • Loading branch information
imba-tjd authored Jun 11, 2024
1 parent 1b71878 commit f0321ee
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 11 deletions.
4 changes: 2 additions & 2 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=False, use_qbi

You can also load an AWQ model by using AutoModelForCausalLM, just make sure you have AutoAWQ installed.
Note that not all models will have fused modules when loading from transformers.
See more [documentation here](https://huggingface.co/docs/transformers/main/en/quantization#awq).
See more [documentation here](https://huggingface.co/docs/transformers/main/en/quantization/awq).

```python
import torch
Expand Down Expand Up @@ -327,4 +327,4 @@ generation_output = model.generate(
)

print(processor.decode(generation_output[0], skip_special_tokens=True))
```
```
23 changes: 14 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,21 @@ Example inference speed (RTX 4090, Ryzen 9 7950X, 64 tokens):
- Install: `pip install autoawq`.
- Your torch version must match the build version, i.e. you cannot use torch 2.0.1 with a wheel that was built with 2.2.0.
- For AMD GPUs, inference will run through ExLlamaV2 kernels without fused layers. You need to pass the following arguments to run with AMD GPUs:
```python
model = AutoAWQForCausalLM.from_quantized(
...,
fuse_layers=False,
use_exllama_v2=True
)
```
- For CPU device, you should install intel-extension-for-transformers with `pip install intel-extension-for-transformers`. And the latest version of torch is required since "intel-extension-for-transformers(ITREX)" was built with the latest version of torch(now ITREX 1.4 was build with torch 2.2). If you build ITREX from source code, then you need to ensure the consistency of the torch version. And you should use "use_qbits=True" for CPU device. Up to now, the feature of fuse_layers hasn't been ready for CPU device.

```python
model = AutoAWQForCausalLM.from_quantized(
...,
fuse_layers=False,
use_exllama_v2=True
)
```
```python
model = AutoAWQForCausalLM.from_quantized(
...,
fuse_layers=False,
use_qbits=True
)
```
## Supported models
Expand All @@ -50,4 +56,3 @@ The detailed support list:
| LLaVa | 7B/13B |
| Mixtral | 8x7B |
| Baichuan | 7B/13B |
| QWen | 1.8B/7B/14/72B |

0 comments on commit f0321ee

Please sign in to comment.