Skip to content
ymcui edited this page Jan 28, 2024 · 2 revisions

FAQ

Question 1: Will you use more data for training? Any RLHF/DPO aligned versions?

Answer: We will use more data for training our models, depending on the resource availabilities. Same for RLHF/DPO versions. However, we do not guarantee we will do that.

Question 2: Why not extend the vocabulary?

Answer: 1) training efficiency; 2) Mixtral add more Chinese tokens than LLaMA series; 3) disk space after quantization; 4) through preliminary experiments, the vocabulary expansion mainly affects the encoding/decoding efficiency but not downstream task performance.

Question 3: Mixtral downstream application support

Answer: Our models are exact the same with the original Mixtral models (including vocabulary). So it is very likely that our model will also support Mixtral-related applications/tools.