Large Language Models (LLM), represented by ChatGPT and GPT-4, have sparked a new wave of research in the field of natural language processing, demonstrating capabilities of Artificial General Intelligence (AGI) and attracting widespread attention from the industry. However, the expensive training and deployment of large language models have posed certain obstacles to building transparent and open academic research.
To promote open research of large models in the Chinese NLP community, this project has open-sourced the Chinese LLaMA model and the Alpaca large model with instruction fine-tuning. These models expand the Chinese vocabulary based on the original LLaMA and use Chinese data for secondary pre-training, further enhancing Chinese basic semantic understanding. Additionally, the project uses Chinese instruction data for fine-tuning on the basis of the Chinese LLaMA, significantly improving the model's understanding and execution of instructions. Please refer to our technical report for further details (Cui, Yang, and Yao, 2023).
Main contents of this project:
- 🚀 Extended Chinese vocabulary on top of original LLaMA with significant encode/decode efficiency
- 🚀 Open-sourced the Chinese LLaMA (general purpose) and Alpaca (instruction-tuned) (7B, 13B)
- 🚀 Quickly deploy and experience the quantized version of the large model on CPU/GPU of your laptop (personal PC)
- 🚀 Support 🤗transformers, llama.cpp, text-generation-webui, LlamaChat, etc.
- Released versions: 7B (basic, Plus), 13B (basic)
💡 The following image shows the actual experience effect of the 7B version model after local deployment (animation unaccelerated, tested on Apple M1 Max).
Multi-modal VLE | Chinese MiniRBT | Chinese LERT | Chinese-English PERT | Chinese MacBERT | Chinese ELECTRA | Chinese XLNet | Chinese BERT | Knowledge distillation tool TextBrewer | Model pruning tool TextPruner
**[2023/04/28] Release v3.0: LLaMA/Alpaca Plus versions are available, more training data used than basic ones. **
[2023/04/18] Release v2.2: Add LlamaChat support (macOS UI), tokenizer merging scripts, documentations are migrated to GitHub Wiki. Refer to Release Note
[2023/04/13] Release v2.1: Add HuggingFace-transformers and text-generation-webui interfances. Refer to Release Note
[2023/04/07] Release v2.0: Release 13B versions of Chinese LLaMA and Alpaca model. Main upgrades: stronger factuality, better performance on QA, translation and more. Refer to Release Note
Previous News
2023/3/31 Release v1.1, major updates: simplification of model merging steps, addition of instruction data crawling script, and important notes about the new version of llama.cpp. See Release Note.
2023/3/28 Open-sourcing Chinese LLaMA and Alpaca, currently offering the 7B version for download and experience
Chapter | Description |
---|---|
Download | Download links for Chinese LLaMA and Alpaca |
Model Reconstruction | (Important) Explains how to merge downloaded LoRA models with the original LLaMA |
Quick Deployment | Steps for quantize and deploy LLMs on personal computers |
Example Results | Examples of the system output |
Training Details | Introduces the training details of Chinese LLaMA and Alpaca |
FAQ | Replies to some common questions |
Limitations | Limitations of the models involved in this project |
The official LLaMA models released by Facebook prohibits commercial use, and the official model weights have not been open-sourced (although there are many third-party download links available online). In order to comply with the relevant licenses, it is currently not possible to release the complete model weights. We appreciate your understanding. After Facebook fully opens up the model weights, this project will update its policies accordingly. What is released here are the LoRA weights, which can be seen as a "patch" for the original LLaMA model, and the complete weights can be obtained by merging the two.
The following table provides a basic comparison of the Chinese LLaMA and Alpaca models, as well as recommended usage scenarios (including, but not limited to).
💡 Plus versions are trained on more data, which is highly recommended for use.
Comparison Item | Chinese LLaMA | Chinese Alpaca |
---|---|---|
Training Method | Traditional CLM (trained on general corpus) | Instruction Fine-tuning (trained on instruction data) |
Input Template | Not required | Must meet template requirements[1] |
Suitable Scenarios ✔️ | Text continuation: Given a context, let the model continue writing | 1. Instruction understanding (Q&A, writing, advice, etc.) 2. Multi-turn context understanding (chat, etc.) |
Unsuitable Scenarios ❌ | Instruction understanding, multi-turn chat, etc. | Unrestricted free text generation |
llama.cpp | Use -p parameter to specify context |
Use -ins parameter to enable instruction understanding + chat mode |
text-generation-webui | Not suitable for chat mode | Use --cpu to run without a GPU; if not satisfied with generated content, consider modifying prompt |
LlamaChat | Choose "LLaMA" when loading the model | Choose "Alpaca" when loading the model |
inference_hf.py | No additional startup parameters required | Add --with_prompt parameter when launching |
Known Issues | If not controlled for termination, it will continue writing until reaching the output length limit.[2] | Current version of the model generates relatively shorter texts, being more concise.[2] |
[1] Templates are built-in for (llama.cpp/LlamaChat/inference_hf.py.
[2] If you encounter issues such as low-quality model responses, nonsensical answers, or failure to understand questions, please check whether you are using the correct model and startup parameters for the scenario.
The Chinese LLaMA model has expanded the Chinese vocabulary on the basis of the original version, and used Chinese plain text data for secondary pre-training. For details, see the Training Details section.
Model | Type | Required Original Model[1] | Size[2] | Download Links[3] |
---|---|---|---|---|
Chinese-LLaMA-7B | general 20G | LLaMA-7B | 770M | [BaiduDisk] [Google Drive] |
Chinese-LLaMA-Plus-7B ⭐️ | general 120G | LLaMA-7B | 790M | [BaiduDisk] [Google Drive] |
Chinese-LLaMA-13B | general 20G | LLaMA-13B | 1G | [BaiduDisk] [Google Drive] |
The Chinese Alpaca model further uses instruction data for fine-tuning on the basis of the above-mentioned Chinese LLaMA model. For details, see the Training Details section.
Model | Type | Required Original Model[1] | Size[2] | Download Links[3] |
---|---|---|---|---|
Chinese-Alpaca-7B | Instruction 2M | LLaMA-7B | 790M | [BaiduDisk] [Google Drive] |
Chinese-Alpaca-Plus-7B ⭐️ | Instruction 4M | LLaMA-7B & Chinese-LLaMA-Plus-7B |
1.1G | [百度网盘] [Google Drive] |
Chinese-Alpaca-13B | Instruction 3M | LLaMA-7B | 1.1G | [BaiduDisk] [Google Drive] |
You can download all the above models in 🤗Model Hub, and use 🤗transformers and 🤗PEFT to call Chinese LLaMA or the Alpaca LoRA model.
Model | MODEL_NAME | Link |
---|---|---|
Chinese-LLaMA-7B | ziqingyang/chinese-llama-lora-7b | Model Hub Link |
Chinese-LLaMA-Plus-7B | ziqingyang/chinese-llama-plus-lora-7b | Model Hub Link |
Chinese-LLaMA-13B | ziqingyang/chinese-llama-lora-13b | Model Hub Link |
Chinese-Alpaca-7B | ziqingyang/chinese-alpaca-lora-7b | Model Hub Link |
Chinese-Alpaca-Plus-7B | ziqingyang/chinese-alpaca-plus-lora-7b | Model Hub Link |
Chinese-Alpaca-13B | ziqingyang/chinese-alpaca-lora-13b | Model Hub Link |
[1] The original LLaMA model needs to be applied for use in Facebook-LLaMA or refer to this PR. Due to copyright issues, this project cannot provide downloads, and we ask for your understanding.
[2] The reconstructed model is slightly larger than the original LLaMA (due to the expanded vocabulary); the 7B model is about 13G+.
[3] After downloading, be sure to check whether the SHA256 of the ZIP file is consistent; for the full value, please see SHA256.md.
The file directory inside the ZIP file is as follows (using Chinese-LLaMA as an example):
chinese_llama_lora_7b/
- adapter_config.json # LoRA weight configuration file
- adapter_model.bin # LoRA weight file
- special_tokens_map.json # special_tokens_map file
- tokenizer_config.json # tokenizer configuration file
- tokenizer.model # tokenizer file
The following is the size of each original model and 4-bit quantization. When converting the corresponding model, make sure that the machine has enough memory and disk space (minimum requirements):
7B | 13B | 33B | 65B | |
---|---|---|---|---|
Original(FP16) | 13 GB | 24 GB | 60 GB | 120 GB |
Quantized (8-bit) | 7.8 GB | 14.9 GB | - | - |
Quantized(4-bit) | 3.9 GB | 7.8 GB | 19.5 GB | 38.5 GB |
In order to merge the LoRA model with the original LLaMA for further tuning or inference, two methods are currently provided:
Method | Usage | Tutorial |
---|---|---|
Online conversion | Suitable for Google Colab users, can use notebook for online conversion and model quantization. | link |
Manual conversion | Suitable for offline conversion, generates models in different formats for quantization or further fine-tuning. | link |
Related documentation has been moved to the project's >>> 📚GitHub Wiki.
We mainly provide the following three ways for inference and local deployment.
Method | Features | Platform | CPU | GPU | Quantization | UI | Tutorial |
---|---|---|---|---|---|---|---|
llama.cpp | a tool for quantizing model and deploying on local CPU | General | ✅ | ✅ | ✅ | ❌ | link |
🤗Transformers | original transformers inference method, support CPU/GPU | General | ✅ | ✅ | ✅ | ❌ | link |
text-generation-webui | a tool for deploying model as a web UI | General | ✅ | ✅ | ✅ | ✅ | link |
LlamaChat | a macOS app that allows you to chat with LLaMA, Alpaca, etc. | MacOS | ✅ | ❌ | ✅ | ✅ | link |
Related documentation has been moved to the project's >>> 📚GitHub Wiki.
In order to quickly evaluate the actual performance of related models, this project compared the effects of Chinese Alpaca-7B, Alpaca-13B, and Alpaca-Plus-7B on some common tasks given the same prompt. Reply generation is random and is affected by factors such as decoding hyperparameters and random seeds. The following related evaluations are not absolutely rigorous, and the test results are for reference only. Welcome to experience it yourself. For detailed evaluation results, please see examples/README.md
Task | Samples | # | Alpaca-7B | Alpaca-13B | Alpaca-Plus-7B |
---|---|---|---|---|---|
💯 Overall | - | 200 | 65.3 | 70.9 | 👍🏻75.3 |
Question Answering | QA.md | 20 | 66 | 74 | 👍🏻80 |
Open QA | OQA.md | 20 | 👍🏻79 | 74 | 👍🏻78 |
Computation, Reasoning | REASONING.md | 20 | 31 | 👍🏻50 | 45 |
Poetry, Literature, Philosophy | LITERATURE.md | 20 | 68 | 73 | 👍🏻76 |
Music, Sports, Entertainment | ENTERTAINMENT.md | 20 | 68 | 74 | 👍🏻79 |
Letters and Articles | GENERATION.md | 20 | 76 | 👍🏻81 | 👍🏻81 |
Translation | TRANSLATION.md | 20 | 76 | 78 | 👍🏻82 |
Multi-turn Dialogue | DIALOGUE.md | 20 | 👍🏻83 | 73 | 👍🏻84 |
Coding | CODE.md | 20 | 57 | 👍🏻64 | 59 |
Ethics | ETHICS.md | 20 | 49 | 68 | 👍🏻89 |
Note: for results on 4-bit quantized models, please refer to ./examples-q4/README.md.
The entire training process includes three parts: vocabulary expansion, pre-training, and instruction fine-tuning. Please refer to merge_tokenizers.py for vocabulary expansion; refer to run_clm.py in 🤗transformers and the relevant parts of dataset processing in the Stanford Alpaca project for pre-training and self-instruct fine-tuning.
Please refer to our >>> 📚GitHub Wiki.
FAQ provides answers to frequent questions. Please see our FAQ before submitting an issue.
Q1: Why can't you release the complete model weights?
Q2: Will there be versions of 33B, and 65B in the future?
Q3: The model doesn't perform well on some tasks!
Q4: Why expand the vocabulary? Can't you just pre-train the original LLaMA with Chinese data?
Q5: The reply is very short
Q6: Under Windows, the model cannot understand Chinese, the generation speed is very slow, etc.
Q7: Chinese-LLaMA 13B model cannot be launched with llama.cpp, reporting inconsistent dimensions.
Please refer to our >>> 📚GitHub Wiki.
Although the models in this project have significantly improved Chinese understanding and generation capabilities compared to the original LLaMA and Alpaca, there are also the following limitations:
- It may produce unpredictable harmful content and content that does not conform to human preferences and values.
- Due to computing power and data issues, the training of the related models is not sufficient, and the Chinese understanding ability needs to be further improved.
- There is no online interactive demo available for now (Note: users can still deploy it locally themselves).
If you find the model, data, code in our project useful, please consider cite our work as follows: https://arxiv.org/abs/2304.08177
@article{chinese-llama-alpaca,
title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca},
author={Cui, Yiming and Yang, Ziqing and Yao, Xin},
journal={arXiv preprint arXiv:2304.08177},
url={https://arxiv.org/abs/2304.08177},
year={2023}
}
This project is based on the following open-source projects for secondary development, and we would like to express our gratitude to the related projects and research and development personnel.
Foundation Models, Codes | Quantization, Inference, Deployment | Data |
---|---|---|
LLaMA by Facebook Alpaca by Stanford alpaca-lora by @tloen |
llama.cpp by @ggerganov LlamaChat by @alexrozanski text-generation-webui by @oobabooga |
pCLUE and translation data by @brightmart |
Episode: The Alpaca Logo is generated by midjourney and is automatically extracted by Preview in MacOS.
The resources related to this project are for academic research purposes only and are strictly prohibited for commercial use. When using parts involving third-party code, please strictly follow the corresponding open-source agreements. The content generated by the model is affected by factors such as model calculation, randomness, and quantization accuracy loss. This project cannot guarantee its accuracy. For any content output by the model, this project does not assume any legal responsibility and does not assume responsibility for any losses that may result from the use of related resources and output results.
This project is initiated and maintained by individuals and collaborators in their spare time, so we cannot guarantee a timely response to resolving relevant issues.
If you have any questions, please submit them in GitHub Issues.
- Before submitting a question, please check if the FAQ can solve the problem and consult past issues to see if they can help.
- Please use our dedicated issue template for submitting.
- Duplicate and unrelated issues will be handled by stable-bot; please understand.
- Raise questions politely and help build a harmonious discussion community.