The fine-tuned 🦙 Vigogne models come in two types: instruction-following models and chat models. The instruction-following models are optimized to generate concise and helpful responses to user instructions, similar to text-davinci-003
. Meanwhile, the chat models are designed for multi-turn dialogues, but they also perform well in instruction-following tasks, similar to gpt-3.5-turbo
.
You can access the weights for these models on the 🤗 Hugging Face Hub. For further insights into the training data used, you can find additional details in the vigogne/data.
Here is a list of recommended models for this project. These models have been trained using more diverse and higher-quality data, along with an optimized training process. It is advisable to use these models as a priority for your project. For alternative models, please refer to the Other Models section.
Model | Type | Foundation model | Data | Description |
---|---|---|---|---|
Vigostral-7B-Chat | Chat | Mistral-7B-v0.1 | ||
Vigogne-2-7B-Chat-V2.0 | Chat | Llama-2-7B | 520K chat data | Check out our blog for more details. |
Vigogne-2-13B-Chat | Chat | Llama-2-13B | 520K chat data | Check out our blog for more details. |
Due to performance and licensing concerns, the models below are no longer recommended for general use. However, they could still be useful in specific scenarios.
Model | Type | Foundation model | Data | Description |
---|---|---|---|---|
Vigogne-2-7B-Chat-V1.0 | Chat | Llama-2-7B | 420K chat data | |
Vigogne-7B-Chat | Chat | LLaMA-7B | 420K chat data | Research use only |
Vigogne-13B-Chat | Chat | LLaMA-13B | 420K chat data | Research use only |
Vigogne-falcon-7B-Chat | Chat | Falcon-7B | 420K chat data | |
Vigogne-2-7B-Instruct | Instruction-following | Llama-2-7B | 260K instruct data | |
Vigogne-2-13B-Instruct | Instruction-following | Llama-2-13B | 260K instruct data | |
Vigogne-7B-Instruct | Instruction-following | LLaMA-7B | 260K instruct data | Research use only |
Vigogne-13B-Instruct | Instruction-following | LLaMA-13B | 260K instruct data | Research use only |
Vigogne-33B-Instruct | Instruction-following | LLaMA-33B | 260K instruct data | Research use only |
Vigogne-Falcon-7B-Instruct | Instruction-following | Falcon-7B | 260K instruct data | |
Vigogne-MPT-7B-Instruct | Instruction-following | MPT-7B | 260K instruct data | |
Vigogne-Bloom-7B1-Instruct | Instruction-following | BLOOM-7B1 | 260K instruct data |
The majority of the training corpus used to train the original LLaMA model is in English. In this case, we have gathered a substantial amount of French corpus and used it to continue the pretraining process. This language adaptive pretraining will improve the model's performance when processing French data.
The training process is still ongoing since it is a computationally expensive task that requires significant resources.