Skip to content

Commit

Permalink
[docs] update for MoE-PEFT
Browse files Browse the repository at this point in the history
  • Loading branch information
mikecovlee committed Aug 21, 2024
1 parent 89fa9d8 commit c10e3ae
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 14 deletions.
41 changes: 30 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-train
| **MixLoRA** | 2.9% | 77.7 | 58.1 | 72.7 | 81.6 | 83.2 | 78.0 | 93.1 | 76.8 | **77.6** |
| **MixDoRA** | 2.9% | 77.5 | 58.2 | 72.6 | 80.9 | 82.2 | 80.4 | 90.6 | 83.4 | **78.2** |

The table above presents the performance of MixLoRA and compares these results with outcomes obtained by employing LoRA and DoRA for fine-tuning. The results demonstrate that the language model with MixLoRA achieves commendable performance across all evaluation methods. All methods are fine-tuned and evaluated with [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on m-LoRA, with all metrics reported as accuracy.
The table above presents the performance of MixLoRA and compares these results with outcomes obtained by employing LoRA and DoRA for fine-tuning. The results demonstrate that the language model with MixLoRA achieves commendable performance across all evaluation methods. All methods are fine-tuned and evaluated with [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on MoE-PEFT, with all metrics reported as accuracy.

<div align="left"><img src="https://raw.githubusercontent.com/TUDB-Labs/MixLoRA/main/assets/Optimization.png" width=60%"></div>

Expand All @@ -31,7 +31,7 @@ You can download the weights of MixLoRA fine-tuned with [meta-llama/Llama-2-7b-h

## Use MixLoRA

MixLoRA is built upon the m-LoRA framework. It is recommended to use MixLoRA with [m-LoRA](https://github.com/mikecovlee/mLoRA).
MixLoRA is built upon the MoE-PEFT framework. It is recommended to use MixLoRA with [MoE-PEFT](https://github.com/TUDB-Labs/MoE-PEFT).

We also provides the integrations of MixLoRA with HuggingFace Transformers for inference. To use it, you can install `mixlora` with following command:

Expand All @@ -51,7 +51,7 @@ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

## Reproduction Instruction

You can reproduce our evaluation results with [m-LoRA v0.3.2](https://github.com/mikecovlee/mLoRA/tree/0.3.2) using the following scripts. You can also use the [latest release of m-LoRA](https://github.com/mikecovlee/mLoRA/releases/latest) for more features such as new pre-trained model support and bugfix.
You can reproduce our evaluation results with [MoE-PEFT v1.0.1](https://github.com/TUDB-Labs/MoE-PEFT/tree/1.0.1) using the following scripts. You can also use the [latest release of MoE-PEFT](https://github.com/TUDB-Labs/MoE-PEFT/releases/latest) for more features such as new pre-trained model support and bugfix.

Please note that, *Single-Task* setup refers to training and evaluating PEFT modules for each task, while *Multi-Task* setup refers to training on mixed tasks, followed by separate evaluation.

Expand All @@ -61,12 +61,12 @@ We conducted our experiments with the following environment:
+ Systems with x86-64 CPUs
+ NVIDIA GPUs: RTX 3090@24GB, RTX A5000@24GB, RTX 4090D@24GB, RTX 4090@24GB, RTX A6000@48GB (for 8B and 13B models)

### Cloning and Checkout m-LoRA
### Cloning and Checkout MoE-PEFT

```bash
git clone https://github.com/mikecovlee/mLoRA
git clone https://github.com/TUDB-Labs/MoE-PEFT
# Optional, just for consistency
git checkout 0.3.2
git checkout 1.0.1
```

### Single-Task
Expand Down Expand Up @@ -105,7 +105,7 @@ torch.cuda.synchronize()
print(start.elapsed_time(end))
```

For m-LoRA, we injected these codes into the `train` function in `mlora/trainer.py` to measure the time elapsed, and we computed the token computation latency by dividing these times by the number of tokens in one batch. The peak GPU memory usage was collected using [`torch.cuda.max_memory_allocated` API](https://pytorch.org/docs/stable/generated/torch.cuda.max_memory_allocated.html). Every metric was collected by running the experiment 10 times separately and calculating the average value.
For MoE-PEFT, we injected these codes into the `train` function in `moe_peft/trainer.py` to measure the time elapsed, and we computed the token computation latency by dividing these times by the number of tokens in one batch. The peak GPU memory usage was collected using [`torch.cuda.max_memory_allocated` API](https://pytorch.org/docs/stable/generated/torch.cuda.max_memory_allocated.html). Every metric was collected by running the experiment 10 times separately and calculating the average value.

## Configuration of MixLoRA

Expand Down Expand Up @@ -137,14 +137,15 @@ Compared with LoRA, MixLoRA have some additional configurations.
```
This is an example of LoRA training configuration.

MixLoRA have two routing strategies: top-k routing (like *Mixtral*) and top-1 switch routing (like *Switch Transformers*), can be configured with `"routing_strategy": "mixlora"` or `"routing_strategy": "mixlora-switch"`.
MixLoRA have three routing strategies: top-k routing (like *Mixtral*), top-p routing (like *Dynamic MoE*) and top-1 switch routing (like *Switch Transformers*), can be configured with `"routing_strategy": "mixlora"`, `"routing_strategy": "mixlora-dynamic"` or `"routing_strategy": "mixlora-switch"`.

**Top-k Routing**
```json
{
...
"routing_strategy": "mixlora",
"router_init_range": 0.02,
"jitter_noise": 0.0,
"num_experts": 8,
"top_k": 2,
"router_loss": true,
Expand All @@ -153,14 +154,32 @@ MixLoRA have two routing strategies: top-k routing (like *Mixtral*) and top-1 sw
}
```

**Top-p Routing**
```json
{
...
"routing_strategy": "mixlora-dynamic",
"router_init_range": 0.02,
"jitter_noise": 0.0,
"num_experts": 8,
"top_p": 0.8,
"temperature": 0.0,
"router_loss": true,
"router_aux_loss_coef": 0.01,
...
}
```

**Top-1 Switch Routing**
```json
{
...
"routing_strategy": "mixlora-switch",
"router_init_range": 0.02,
"router_init_range": 1.0,
"jitter_noise": 0.01,
"num_experts": 8,
"expert_capacity": 32,
"ffn_dropout": 0.0,
"router_loss": true,
"router_aux_loss_coef": 0.01,
"router_z_loss_coef": 0.01,
Expand Down Expand Up @@ -208,7 +227,7 @@ python generate.py \
--base_model meta-llama/Llama-2-7b-hf \
--lora_weights TUDB-Labs/alpaca-mixlora-7b \
--template template/alpaca.json \
--instruction "What is m-LoRA?"
--instruction "What is MoE-PEFT?"
```

## Citation
Expand All @@ -235,7 +254,7 @@ If MixLoRA has been useful for your work, please consider citing it using the ap
## Copyright
Copyright © 2023-2024 All Rights Reserved.

MixLoRA, m-LoRA and the weights of alpaca-mixlora-7b are licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
MixLoRA, MoE-PEFT and the weights of alpaca-mixlora-7b are licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

```
Licensed under the Apache License, Version 2.0 (the "License");
Expand Down
6 changes: 3 additions & 3 deletions mixlora/prompter.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
from typing import Dict, Optional, Union

prompt_templates = {
"mlora": {
"description": "Default Prompt Template Provided by m-LoRA",
"moe_peft": {
"description": "Default Prompt Template Provided by MoE-PEFT",
"prompt_input": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Output:\n",
"prompt_no_input": "### Instruction:\n{instruction}\n\n### Output:\n",
"response_split": "### Output:",
Expand All @@ -28,7 +28,7 @@
class Prompter:
def __init__(self, template: Optional[Union[Dict, str]] = None):
if template is None:
self.template = prompt_templates["mlora"]
self.template = prompt_templates["moe_peft"]
elif isinstance(template, str):
if osp.exists(template):
with open(template) as fp:
Expand Down

0 comments on commit c10e3ae

Please sign in to comment.