From f0321eedca887c12680553fc561d176b03b1b9a5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E8=B0=AD=E4=B9=9D=E9=BC=8E?= <109224573@qq.com> Date: Tue, 11 Jun 2024 18:54:50 +0800 Subject: [PATCH] Update doc (#499) --- docs/examples.md | 4 ++-- docs/index.md | 23 ++++++++++++++--------- 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/docs/examples.md b/docs/examples.md index f84f385b..5e3cd580 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -198,7 +198,7 @@ model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=False, use_qbi You can also load an AWQ model by using AutoModelForCausalLM, just make sure you have AutoAWQ installed. Note that not all models will have fused modules when loading from transformers. -See more [documentation here](https://huggingface.co/docs/transformers/main/en/quantization#awq). +See more [documentation here](https://huggingface.co/docs/transformers/main/en/quantization/awq). ```python import torch @@ -327,4 +327,4 @@ generation_output = model.generate( ) print(processor.decode(generation_output[0], skip_special_tokens=True)) -``` \ No newline at end of file +``` diff --git a/docs/index.md b/docs/index.md index 0fe4c754..fa675a31 100644 --- a/docs/index.md +++ b/docs/index.md @@ -15,15 +15,21 @@ Example inference speed (RTX 4090, Ryzen 9 7950X, 64 tokens): - Install: `pip install autoawq`. - Your torch version must match the build version, i.e. you cannot use torch 2.0.1 with a wheel that was built with 2.2.0. - For AMD GPUs, inference will run through ExLlamaV2 kernels without fused layers. You need to pass the following arguments to run with AMD GPUs: + ```python + model = AutoAWQForCausalLM.from_quantized( + ..., + fuse_layers=False, + use_exllama_v2=True + ) + ``` - For CPU device, you should install intel-extension-for-transformers with `pip install intel-extension-for-transformers`. And the latest version of torch is required since "intel-extension-for-transformers(ITREX)" was built with the latest version of torch(now ITREX 1.4 was build with torch 2.2). If you build ITREX from source code, then you need to ensure the consistency of the torch version. And you should use "use_qbits=True" for CPU device. Up to now, the feature of fuse_layers hasn't been ready for CPU device. - -```python -model = AutoAWQForCausalLM.from_quantized( - ..., - fuse_layers=False, - use_exllama_v2=True -) -``` + ```python + model = AutoAWQForCausalLM.from_quantized( + ..., + fuse_layers=False, + use_qbits=True + ) + ``` ## Supported models @@ -50,4 +56,3 @@ The detailed support list: | LLaVa | 7B/13B | | Mixtral | 8x7B | | Baichuan | 7B/13B | -| QWen | 1.8B/7B/14/72B | \ No newline at end of file