Skip to content

Commit

Permalink
Update readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
c0sogi authored Aug 18, 2023
1 parent c226b31 commit c6f36c6
Showing 1 changed file with 22 additions and 20 deletions.
42 changes: 22 additions & 20 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,40 +62,42 @@ options:
- The project automatically do git clones and installs the required dependencies, including **pytorch** and **tensorflow**, when the server is started. This is done by checking the `pyproject.toml` or `requirements.txt` file in the root directory of this project or other repositories. `pyproject.toml` will be parsed into `requirements.txt` with `poetry`. If you want to add more dependencies, simply add them to the file.


## How to download the models
## How can I get the models?

You can download the models from HuggingFace. I prefer to use the following link to download the models: https://huggingface.co/TheBloke
#### 1. **Automatic download** (_Recommended_)
> ![image](contents/auto-download-model.png)
1. **LLama.cpp** models: Download the **bin** file from the GGML model page. Choose quantization method you prefer. The bin file name will be the **model_path**.
- Just set **model_path** of your own model defintion in `model_definitions.py` as actual **huggingface repository** and run the server. The server will automatically download the model from HuggingFace.co, when the model is requested for the first time.

#### 2. **Manual download**
> ![image](contents/example-models.png)
- You can download the models manually if you want. I prefer to use the [following link](https://huggingface.co/TheBloke) to download the models

*Available quantizations: q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K*

2. **Exllama** models: Download three files from the GPTQ model page: **config.json / tokenizer.model / xxx.safetensors** and put them in a folder. The folder name will be the **model_path**.

## Where to put the models
1. For **LLama.cpp** models: Download the **bin** file from the GGML model page. Choose quantization method you prefer. The bin file name will be the **model_path**.

> **Note:** The models are not included in this repository. You have to download them from HuggingFace.
The LLama.cpp GGML model must be put here as a **bin** file, in `models/ggml/`.

For example, if you downloaded a q4_0 quantized model from [this link](https://huggingface.co/TheBloke/robin-7B-v2-GGML),
The path of the model has to be **robin-7b.ggmlv3.q4_0.bin**.

### 1. Llama.cpp
The LLama.cpp GGML model must be put here as a **bin** file, in `models/ggml/`.
*Available quantizations: q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K*

For example, if you downloaded a q4_0 quantized model from "https://huggingface.co/TheBloke/robin-7B-v2-GGML",
The path of the model has to be **robin-7b.ggmlv3.q4_0.bin**.
2. For **Exllama** models: Download three files from the GPTQ model page: **config.json / tokenizer.model / \*.safetensors** and put them in a folder. The folder name will be the **model_path**.

### 2. Exllama
The Exllama GPTQ model must be put here as a **folder**, in `models/gptq/`.
The Exllama GPTQ model must be put here as a **folder**, in `models/gptq/`.

For example, if you downloaded 3 files from "https://huggingface.co/TheBloke/orca_mini_7B-GPTQ/tree/main":
For example, if you downloaded 3 files from [this link](https://huggingface.co/TheBloke/orca_mini_7B-GPTQ/tree/main),

- orca-mini-7b-GPTQ-4bit-128g.no-act.order.safetensors
- tokenizer.model
- config.json
- orca-mini-7b-GPTQ-4bit-128g.no-act.order.safetensors
- tokenizer.model
- config.json

Then you need to put them in a folder.
The path of the model has to be the folder name. Let's say, **orca_mini_7b**, which contains the 3 files.
then you need to put them in a folder.
The path of the model has to be the folder name. Let's say, **orca_mini_7b**, which contains the 3 files.

![image](contents/example-models.png)

## Where to define the models
Define llama.cpp & exllama models in `model_definitions.py`. You can define all necessary parameters to load the models there. Refer to the example in the file.
Expand Down

0 comments on commit c6f36c6

Please sign in to comment.