Skip to content
hydropix edited this page Feb 14, 2026 · 24 revisions

Translation Quality Benchmark

Last updated: 2026-02-14 10:34

This wiki contains translation quality benchmarks for various LLM models across 19 languages.

Important: These benchmarks evaluate translation quality on challenging literary texts featuring complex vocabulary, stylistic devices, and nuanced expressions. Performance on simpler content (technical documentation, news articles, or straightforward informative texts) is typically 15-25% higher.

Score Legend

Indicator Range Label
🟢 9-10 Excellent
🟡 7-8 Good
🟠 5-6 Acceptable
🔴 3-4 Poor
1-2 Failed

Model Rankings

Overall performance across all tested languages:

Rank Model Avg Score Accuracy Fluency Style Languages Tested
1 translategemma:27b 🟡 7.6 8.0 8.0 7.0 190
2 google/gemini-3-flash-preview 🟡 7.6 7.9 7.4 7.3 95
3 mistralai/mistral-medium-3.1 🟡 7.5 7.9 7.3 7.3 95
4 google/gemini-2.0-flash-001 🟡 7.4 8.0 7.3 7.2 95
5 gemma3:27b-it-qat 🟡 7.1 7.6 7.0 6.8 95
6 gemma3:27b 🟡 7.1 7.7 7.1 6.8 95
7 translategemma:27b-it-q8_0 🟡 7.1 7.5 7.1 6.6 95
8 ministral-3:14b 🟠 6.9 7.4 6.9 6.6 95
9 translategemma:12b-it-q4_K_M 🟠 6.8 7.3 6.8 6.2 95
10 translategemma:12b 🟠 6.8 7.2 6.8 6.1 95
11 qwen3:30b 🟠 6.7 7.4 6.7 6.4 95
12 gemma3:12b 🟠 6.7 7.3 6.6 6.4 95
13 qwen3:30b-instruct 🟠 6.6 7.2 6.6 6.2 95
14 mistral-small:24b 🟠 6.4 7.2 6.4 6.2 95
15 ministral-3 🟠 6.3 7.0 6.2 5.9 95
16 translategemma:4b 🟠 6.1 6.5 6.3 5.3 95
17 qwen3:14b 🟠 6.0 6.7 6.0 5.7 95
18 translategemma:4b-it-q4_K_M 🟠 6.0 6.5 6.2 5.2 95
19 qwen3:4b 🟠 5.9 6.7 5.8 5.5 95
20 gemma3:4b 🟠 5.7 6.5 5.8 5.3 95
21 qwen3:8b 🟠 5.5 6.3 5.4 5.2 95
22 llama3.1:8b 🔴 4.2 4.9 4.2 3.8 95
23 llama3.2 ⚫ 2.5 3.4 2.5 2.3 95

Language Rankings (Top 15)

Best translation quality by target language:

Rank Language Native Avg Score Best Model Tests
1 Spanish Español 🟡 7.4 google/gemini-3-flash-preview 120
2 French Français 🟡 7.3 mistralai/mistral-medium-3.1 120
3 Portuguese Português 🟡 7.3 translategemma:27b 120
4 Italian Italiano 🟡 7.1 translategemma:27b 120
5 Chinese (Simplified) 简体中文 🟠 6.9 qwen3:30b-instruct 120
6 Chinese (Traditional) 繁體中文 🟠 6.9 qwen3:30b 120
7 German Deutsch 🟠 6.9 translategemma:27b 120
8 Russian Русский 🟠 6.8 mistralai/mistral-medium-3.1 120
9 Vietnamese Tiếng Việt 🟠 6.5 translategemma:27b 120
10 Ukrainian Українська 🟠 6.4 google/gemini-3-flash-preview 120
11 Polish Polski 🟠 6.4 google/gemini-3-flash-preview 120
12 Arabic العربية 🟠 6.3 translategemma:27b 120
13 Thai ไทย 🟠 6.2 translategemma:27b 120
14 Japanese 日本語 🟠 6.0 google/gemini-3-flash-preview 120
15 Hindi हिन्दी 🟠 5.9 google/gemini-3-flash-preview 120

View all 19 languages...


Quick Stats

  • Total Models Tested: 23
  • Total Languages: 19
  • Total Translations: 2280
  • Evaluator Model: anthropic/claude-haiku-4.5
  • Source Language: English

Categories

By Language Category

European Major Languages

Language Avg Score Best Model
Spanish 🟡 7.4 google/gemini-3-flash-preview
French 🟡 7.3 mistralai/mistral-medium-3.1
Portuguese 🟡 7.3 translategemma:27b
Italian 🟡 7.1 translategemma:27b
German 🟠 6.9 translategemma:27b
Polish 🟠 6.4 google/gemini-3-flash-preview

Asian Languages

Language Avg Score Best Model
Chinese (Simplified) 🟠 6.9 qwen3:30b-instruct
Chinese (Traditional) 🟠 6.9 qwen3:30b
Vietnamese 🟠 6.5 translategemma:27b
Thai 🟠 6.2 translategemma:27b
Japanese 🟠 6.0 google/gemini-3-flash-preview
Hindi 🟠 5.9 google/gemini-3-flash-preview
Korean 🟠 5.8 google/gemini-2.0-flash-001
Tamil 🟠 5.3 translategemma:27b
Bengali 🟠 5.3 google/gemini-3-flash-preview

Cyrillic Languages

Language Avg Score Best Model
Russian 🟠 6.8 mistralai/mistral-medium-3.1
Ukrainian 🟠 6.4 google/gemini-3-flash-preview

Semitic Languages

Language Avg Score Best Model
Arabic 🟠 6.3 translategemma:27b
Hebrew 🟠 5.3 gemma3:27b-it-qat

Browse


Generated by TranslateBookWithLLM benchmark system

Clone this wiki locally