Update doc (#499)

casper-hansen · Jun 11, 2024 · f0321ee · f0321ee
1 parent 1b71878
commit f0321ee
Show file tree

Hide file tree

Showing 2 changed files with 16 additions and 11 deletions.
diff --git a/docs/examples.md b/docs/examples.md
@@ -198,7 +198,7 @@ model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=False, use_qbi
 
 You can also load an AWQ model by using AutoModelForCausalLM, just make sure you have AutoAWQ installed.
 Note that not all models will have fused modules when loading from transformers.
-See more [documentation here](https://huggingface.co/docs/transformers/main/en/quantization#awq).
+See more [documentation here](https://huggingface.co/docs/transformers/main/en/quantization/awq).
 
 ```python
 import torch
@@ -327,4 +327,4 @@ generation_output = model.generate(
 )
 
 print(processor.decode(generation_output[0], skip_special_tokens=True))
-```
+```
diff --git a/docs/index.md b/docs/index.md
@@ -15,15 +15,21 @@ Example inference speed (RTX 4090, Ryzen 9 7950X, 64 tokens):
 - Install: `pip install autoawq`.
 - Your torch version must match the build version, i.e. you cannot use torch 2.0.1 with a wheel that was built with 2.2.0.
 - For AMD GPUs, inference will run through ExLlamaV2 kernels without fused layers. You need to pass the following arguments to run with AMD GPUs:
+    ```python
+    model = AutoAWQForCausalLM.from_quantized(
+        ...,
+        fuse_layers=False,
+        use_exllama_v2=True
+    )
+    ```
 - For CPU device, you should install intel-extension-for-transformers with `pip install intel-extension-for-transformers`. And the latest version of torch is required since "intel-extension-for-transformers(ITREX)" was built with the latest version of torch(now ITREX 1.4 was build with torch 2.2). If you build ITREX from source code, then you need to ensure the consistency of the torch version. And you should use "use_qbits=True" for CPU device. Up to now, the feature of fuse_layers hasn't been ready for CPU device.
-
-```python
-model = AutoAWQForCausalLM.from_quantized(
-    ...,
-    fuse_layers=False,
-    use_exllama_v2=True
-)
-```
+    ```python
+    model = AutoAWQForCausalLM.from_quantized(
+        ...,
+        fuse_layers=False,
+        use_qbits=True
+    )
+    ```
 
 ## Supported models
 
@@ -50,4 +56,3 @@ The detailed support list:
 | LLaVa    | 7B/13B                      |
 | Mixtral  | 8x7B                        |
 | Baichuan | 7B/13B                      |
-| QWen     | 1.8B/7B/14/72B              |