-
-
Notifications
You must be signed in to change notification settings - Fork 63
Home
hydropix edited this page Mar 7, 2026
·
25 revisions
Last updated: 2026-03-07 16:35
This wiki contains translation quality benchmarks for various LLM models across 19 languages.
| Indicator | Range | Label |
|---|---|---|
| 🟢 | 9-10 | Excellent |
| 🟡 | 7-8 | Good |
| 🟠 | 5-6 | Acceptable |
| 🔴 | 3-4 | Poor |
| ⚫ | 1-2 | Failed |
Overall performance across all tested languages:
| Rank | Model | Avg Score | Accuracy | Fluency | Style | Languages Tested |
|---|---|---|---|---|---|---|
| 1 | translategemma:27b | 🟡 7.6 | 8.0 | 8.0 | 7.0 | 190 |
| 2 | google/gemini-3-flash-preview | 🟡 7.6 | 7.9 | 7.4 | 7.3 | 95 |
| 3 | mistralai/mistral-medium-3.1 | 🟡 7.5 | 7.9 | 7.3 | 7.3 | 95 |
| 4 | google/gemini-2.0-flash-001 | 🟡 7.4 | 8.0 | 7.3 | 7.2 | 95 |
| 5 | gemma3:27b-it-qat | 🟡 7.1 | 7.6 | 7.0 | 6.8 | 95 |
| 6 | gemma3:27b | 🟡 7.1 | 7.7 | 7.1 | 6.8 | 95 |
| 7 | translategemma:27b-it-q8_0 | 🟡 7.1 | 7.5 | 7.1 | 6.6 | 95 |
| 8 | qwen3.5:35b | 🟠 6.9 | 7.4 | 7.2 | 6.4 | 95 |
| 9 | ministral-3:14b | 🟠 6.9 | 7.4 | 6.9 | 6.6 | 95 |
| 10 | translategemma:12b-it-q4_K_M | 🟠 6.8 | 7.3 | 6.8 | 6.2 | 95 |
| 11 | translategemma:12b | 🟠 6.8 | 7.2 | 6.8 | 6.1 | 95 |
| 12 | qwen3:30b | 🟠 6.7 | 7.4 | 6.7 | 6.4 | 95 |
| 13 | gemma3:12b | 🟠 6.7 | 7.3 | 6.6 | 6.4 | 95 |
| 14 | qwen3:30b-instruct | 🟠 6.6 | 7.2 | 6.6 | 6.2 | 95 |
| 15 | mistral-small:24b | 🟠 6.4 | 7.2 | 6.4 | 6.2 | 95 |
| 16 | ministral-3 | 🟠 6.3 | 7.0 | 6.2 | 5.9 | 95 |
| 17 | translategemma:4b | 🟠 6.1 | 6.5 | 6.3 | 5.3 | 95 |
| 18 | qwen3:14b | 🟠 6.0 | 6.7 | 6.0 | 5.7 | 95 |
| 19 | translategemma:4b-it-q4_K_M | 🟠 6.0 | 6.5 | 6.2 | 5.2 | 95 |
| 20 | qwen3:4b | 🟠 5.9 | 6.7 | 5.8 | 5.5 | 95 |
| 21 | gemma3:4b | 🟠 5.7 | 6.5 | 5.8 | 5.3 | 95 |
| 22 | qwen3:8b | 🟠 5.5 | 6.3 | 5.4 | 5.2 | 95 |
| 23 | qwen3.5:9b | 🟠 5.2 | 5.9 | 5.3 | 4.7 | 95 |
| 24 | llama3.1:8b | 🔴 4.2 | 4.9 | 4.2 | 3.8 | 95 |
| 25 | llama3.2 | ⚫ 2.5 | 3.4 | 2.5 | 2.3 | 95 |
Best translation quality by target language:
| Rank | Language | Native | Avg Score | Best Model | Tests |
|---|---|---|---|---|---|
| 1 | Spanish | Español | 🟡 7.4 | qwen3.5:35b | 130 |
| 2 | French | Français | 🟡 7.3 | mistralai/mistral-medium-3.1 | 130 |
| 3 | Portuguese | Português | 🟡 7.3 | translategemma:27b | 130 |
| 4 | Chinese (Simplified) | 简体中文 | 🟡 7.1 | qwen3.5:9b | 130 |
| 5 | Italian | Italiano | 🟡 7.0 | qwen3.5:35b | 130 |
| 6 | Chinese (Traditional) | 繁體中文 | 🟠 7.0 | qwen3.5:35b | 130 |
| 7 | German | Deutsch | 🟠 6.9 | translategemma:27b | 130 |
| 8 | Russian | Русский | 🟠 6.8 | mistralai/mistral-medium-3.1 | 130 |
| 9 | Vietnamese | Tiếng Việt | 🟠 6.5 | qwen3.5:35b | 130 |
| 10 | Ukrainian | Українська | 🟠 6.3 | google/gemini-3-flash-preview | 130 |
| 11 | Polish | Polski | 🟠 6.3 | google/gemini-3-flash-preview | 130 |
| 12 | Arabic | العربية | 🟠 6.3 | translategemma:27b | 130 |
| 13 | Thai | ไทย | 🟠 6.2 | translategemma:27b | 130 |
| 14 | Japanese | 日本語 | 🟠 6.0 | google/gemini-3-flash-preview | 130 |
| 15 | Korean | 한국어 | 🟠 5.8 | google/gemini-2.0-flash-001 | 130 |
- Total Models Tested: 25
- Total Languages: 19
- Total Translations: 2470
- Evaluator Model: anthropic/claude-haiku-4.5
- Source Language: English
| Language | Avg Score | Best Model |
|---|---|---|
| Spanish | 🟡 7.4 | qwen3.5:35b |
| French | 🟡 7.3 | mistralai/mistral-medium-3.1 |
| Portuguese | 🟡 7.3 | translategemma:27b |
| Italian | 🟡 7.0 | qwen3.5:35b |
| German | 🟠 6.9 | translategemma:27b |
| Polish | 🟠 6.3 | google/gemini-3-flash-preview |
| Language | Avg Score | Best Model |
|---|---|---|
| Chinese (Simplified) | 🟡 7.1 | qwen3.5:9b |
| Chinese (Traditional) | 🟠 7.0 | qwen3.5:35b |
| Vietnamese | 🟠 6.5 | qwen3.5:35b |
| Thai | 🟠 6.2 | translategemma:27b |
| Japanese | 🟠 6.0 | google/gemini-3-flash-preview |
| Korean | 🟠 5.8 | google/gemini-2.0-flash-001 |
| Hindi | 🟠 5.8 | google/gemini-3-flash-preview |
| Tamil | 🟠 5.2 | translategemma:27b |
| Bengali | 🟠 5.2 | google/gemini-3-flash-preview |
| Language | Avg Score | Best Model |
|---|---|---|
| Russian | 🟠 6.8 | mistralai/mistral-medium-3.1 |
| Ukrainian | 🟠 6.3 | google/gemini-3-flash-preview |
| Language | Avg Score | Best Model |
|---|---|---|
| Arabic | 🟠 6.3 | translategemma:27b |
| Hebrew | 🟠 5.1 | gemma3:27b-it-qat |
- By Language: All Languages
- By Model: All Models
- Benchmark Documentation: How to Run Benchmarks
- Raw Data: Download JSON
Generated by TranslateBookWithLLM benchmark system