Skip to content
hydropix edited this page Mar 7, 2026 · 25 revisions

Translation Quality Benchmark

Last updated: 2026-03-07 16:35

This wiki contains translation quality benchmarks for various LLM models across 19 languages.

Score Legend

Indicator Range Label
🟢 9-10 Excellent
🟡 7-8 Good
🟠 5-6 Acceptable
🔴 3-4 Poor
1-2 Failed

Model Rankings

Overall performance across all tested languages:

Rank Model Avg Score Accuracy Fluency Style Languages Tested
1 translategemma:27b 🟡 7.6 8.0 8.0 7.0 190
2 google/gemini-3-flash-preview 🟡 7.6 7.9 7.4 7.3 95
3 mistralai/mistral-medium-3.1 🟡 7.5 7.9 7.3 7.3 95
4 google/gemini-2.0-flash-001 🟡 7.4 8.0 7.3 7.2 95
5 gemma3:27b-it-qat 🟡 7.1 7.6 7.0 6.8 95
6 gemma3:27b 🟡 7.1 7.7 7.1 6.8 95
7 translategemma:27b-it-q8_0 🟡 7.1 7.5 7.1 6.6 95
8 qwen3.5:35b 🟠 6.9 7.4 7.2 6.4 95
9 ministral-3:14b 🟠 6.9 7.4 6.9 6.6 95
10 translategemma:12b-it-q4_K_M 🟠 6.8 7.3 6.8 6.2 95
11 translategemma:12b 🟠 6.8 7.2 6.8 6.1 95
12 qwen3:30b 🟠 6.7 7.4 6.7 6.4 95
13 gemma3:12b 🟠 6.7 7.3 6.6 6.4 95
14 qwen3:30b-instruct 🟠 6.6 7.2 6.6 6.2 95
15 mistral-small:24b 🟠 6.4 7.2 6.4 6.2 95
16 ministral-3 🟠 6.3 7.0 6.2 5.9 95
17 translategemma:4b 🟠 6.1 6.5 6.3 5.3 95
18 qwen3:14b 🟠 6.0 6.7 6.0 5.7 95
19 translategemma:4b-it-q4_K_M 🟠 6.0 6.5 6.2 5.2 95
20 qwen3:4b 🟠 5.9 6.7 5.8 5.5 95
21 gemma3:4b 🟠 5.7 6.5 5.8 5.3 95
22 qwen3:8b 🟠 5.5 6.3 5.4 5.2 95
23 qwen3.5:9b 🟠 5.2 5.9 5.3 4.7 95
24 llama3.1:8b 🔴 4.2 4.9 4.2 3.8 95
25 llama3.2 ⚫ 2.5 3.4 2.5 2.3 95

Language Rankings (Top 15)

Best translation quality by target language:

Rank Language Native Avg Score Best Model Tests
1 Spanish Español 🟡 7.4 qwen3.5:35b 130
2 French Français 🟡 7.3 mistralai/mistral-medium-3.1 130
3 Portuguese Português 🟡 7.3 translategemma:27b 130
4 Chinese (Simplified) 简体中文 🟡 7.1 qwen3.5:9b 130
5 Italian Italiano 🟡 7.0 qwen3.5:35b 130
6 Chinese (Traditional) 繁體中文 🟠 7.0 qwen3.5:35b 130
7 German Deutsch 🟠 6.9 translategemma:27b 130
8 Russian Русский 🟠 6.8 mistralai/mistral-medium-3.1 130
9 Vietnamese Tiếng Việt 🟠 6.5 qwen3.5:35b 130
10 Ukrainian Українська 🟠 6.3 google/gemini-3-flash-preview 130
11 Polish Polski 🟠 6.3 google/gemini-3-flash-preview 130
12 Arabic العربية 🟠 6.3 translategemma:27b 130
13 Thai ไทย 🟠 6.2 translategemma:27b 130
14 Japanese 日本語 🟠 6.0 google/gemini-3-flash-preview 130
15 Korean 한국어 🟠 5.8 google/gemini-2.0-flash-001 130

View all 19 languages...


Quick Stats

  • Total Models Tested: 25
  • Total Languages: 19
  • Total Translations: 2470
  • Evaluator Model: anthropic/claude-haiku-4.5
  • Source Language: English

Categories

By Language Category

European Major Languages

Language Avg Score Best Model
Spanish 🟡 7.4 qwen3.5:35b
French 🟡 7.3 mistralai/mistral-medium-3.1
Portuguese 🟡 7.3 translategemma:27b
Italian 🟡 7.0 qwen3.5:35b
German 🟠 6.9 translategemma:27b
Polish 🟠 6.3 google/gemini-3-flash-preview

Asian Languages

Language Avg Score Best Model
Chinese (Simplified) 🟡 7.1 qwen3.5:9b
Chinese (Traditional) 🟠 7.0 qwen3.5:35b
Vietnamese 🟠 6.5 qwen3.5:35b
Thai 🟠 6.2 translategemma:27b
Japanese 🟠 6.0 google/gemini-3-flash-preview
Korean 🟠 5.8 google/gemini-2.0-flash-001
Hindi 🟠 5.8 google/gemini-3-flash-preview
Tamil 🟠 5.2 translategemma:27b
Bengali 🟠 5.2 google/gemini-3-flash-preview

Cyrillic Languages

Language Avg Score Best Model
Russian 🟠 6.8 mistralai/mistral-medium-3.1
Ukrainian 🟠 6.3 google/gemini-3-flash-preview

Semitic Languages

Language Avg Score Best Model
Arabic 🟠 6.3 translategemma:27b
Hebrew 🟠 5.1 gemma3:27b-it-qat

Browse


Generated by TranslateBookWithLLM benchmark system

Clone this wiki locally