Skip to content

🔧 Fix committee-reports articles containing English content in non-English language versions #844

@pethers

Description

@pethers

📋 Issue Type

Bug Fix - Translation Completeness

🎯 Objective

Fix ~48 committee-reports articles (4 dates × 12 languages) that contain English section headings and body paragraphs instead of content in the target language.

📊 Current State

The following committee-reports articles contain English content (headings and body text) despite being designated for non-English languages:

Affected Articles (4 dates × 12 languages = 48 articles)

Date Languages Affected
2026-02-16 da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh
2026-02-17 da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh
2026-02-18 da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh
2026-02-24 da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh

Specific English Content Found

  1. Section headings in English: <h2>What to Watch</h2>, <h2>What to Watch in the Coming Weeks</h2>
  2. Body paragraphs in English: Full analytical paragraphs about committee proceedings
  3. English phrases in body: "Chamber debate tactics", "amendment proposals from opposition parties", "Expected vote outcome"
  4. Meta keywords in English: content="committee, reports, betänkanden, Ukraine aid, data protection..."

Example (Danish article with English content)

File: news/2026-02-16-committee-reports-da.html

  • Title/description: ✅ Correctly in Danish
  • H2 headings: ❌ "What to Watch" (should be "Hvad skal man holde øje med")
  • Body paragraphs: ❌ Multiple English paragraphs
  • Keywords: ❌ English keywords

🚀 Desired State

All 48 articles fully translated into their target language:

  • Section headings translated using CONTENT_LABELS equivalents
  • Body paragraphs rewritten in the target language
  • Meta keywords translated to target language
  • data-translate markers removed if present

🔧 Implementation Approach

Recommended Strategy: Re-generate or Batch-Translate

Option A (Preferred): Use the existing scripts/generate_committee_articles.py translation system to regenerate the affected articles with proper translations.

Option B: Create a targeted fix script similar to scripts/fix-mixed-language-descriptions.py that:

  1. Scans news/2026-02-{16,17,18,24}-committee-reports-{lang}.html
  2. Identifies English headings and replaces with CONTENT_LABELS equivalents
  3. Translates English body paragraphs to the target language
  4. Localizes meta keywords
  5. Validates the result with scripts/validate-news-translations.ts

Files to Fix (48 total)

news/2026-02-16-committee-reports-{da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html
news/2026-02-17-committee-reports-{da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html
news/2026-02-18-committee-reports-{da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html
news/2026-02-24-committee-reports-{da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html

🤖 Recommended Agent

agent:news-journalist — Has expertise in the article generation system, translation pipeline, and can use MCP tools to regenerate articles with proper translations. The content-generator agent could also assist with batch translation.

✅ Acceptance Criteria

  • All 48 committee-reports articles have headings in the target language
  • All body paragraphs are translated (no English paragraphs in non-EN files)
  • Meta keywords are localized per language
  • data-translate="true" markers eliminated
  • npx tsx scripts/validate-news-translations.ts passes for all fixed files
  • HTML validation passes (htmlhint)
  • RTL languages (ar, he) maintain correct text direction

📚 References

  • Content labels for heading translations: scripts/data-transformers/constants/content-labels-part1.ts, content-labels-part2.ts
  • Translation dictionary: scripts/translation-dictionary.ts
  • Existing fix script pattern: scripts/fix-mixed-language-descriptions.py
  • Committee article generator: scripts/generate_committee_articles.py

🏷️ Labels

type:bug, component:i18n, component:news, translation, priority-high, component:content

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions