-
Notifications
You must be signed in to change notification settings - Fork 43
faq_en
ymcui edited this page Jan 28, 2024
·
2 revisions
Answer: We will use more data for training our models, depending on the resource availabilities. Same for RLHF/DPO versions. However, we do not guarantee we will do that.
Answer: 1) training efficiency; 2) Mixtral add more Chinese tokens than LLaMA series; 3) disk space after quantization; 4) through preliminary experiments, the vocabulary expansion mainly affects the encoding/decoding efficiency but not downstream task performance.
Answer: Our models are exact the same with the original Mixtral models (including vocabulary). So it is very likely that our model will also support Mixtral-related applications/tools.
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Scripts
- FAQ