A stateless subtitle translator powered by the Google Gemini API, meticulously engineered for perfect metadata preservation and translation.
- π₯οΈ Stateless & Browser-Based UI: Built with Streamlit, the tool runs in any modern web browser, offering a clean, intuitive interface that works on any operating system.
- π Perfect Metadata Preservation: At its core, the translator operates on a "Conveyor Belt Architecture," which surgically separates subtitle text from its metadata (timestamps, indices). Only the text is sent for translation, ensuring that timing information remains untouched and perfectly synchronized.
- π Robust AI Communication (ID Anchoring Protocol): We solved the critical "Count Mismatch" problem where LLMs merge or split lines arbitrarily. Every line is anchored with a unique ID, forcing the AI to maintain a 1:1 structural correspondence between the source and translated text. This guarantees that the reassembled subtitle file is never corrupted.
- π§ Context-Aware Engine (In-Progress): The system employs a "Scout-Report-Inject" architecture to analyze the script's genre, tone, and character relationships beforehand. This generated "Context Guide" is injected into every translation request, dramatically improving consistency and tonal accuracy.
- Note: While the framework for deep context analysis is in place, achieving perfect narrative and emotional context across an entire script is an ongoing challenge and a key area for future improvement. The current implementation provides a significant quality boost but is not yet infallible.
- π Live Execution Dashboard: A visual grid displays the real-time status of each chunk (Waiting, Processing, Success, Error), complemented by a HUD showing elapsed time, average chunk speed, and an estimated time of completion (ETA).
- π§ Advanced Control & Tuning:
- Manual Retry & Emergency Stop: Failed chunks can be retried individually without restarting the entire process. A global stop button allows you to halt the operation at any time.
- Reasoning Bucket: A toggle to switch the AI into "Max Reasoning" mode, instructing it to perform deeper, step-by-step analysis for higher-quality translation of nuanced dialogue, at the cost of speed.
- Adjustable Chunk Size: A slider to control the amount of text sent per API call, allowing users to balance speed against stability.
Follow these steps to run the application in your local development environment.
- Python 3.9 or higher
- An active Google API Key with the Gemini API enabled. You can get one from Google AI Studio.
-
Clone the repository:
git clone https://github.com/your-repo/your-project.git cd your-project -
(Recommended) Create and activate a virtual environment:
# For Windows python -m venv venv .\venv\Scripts\activate # For macOS/Linux python3 -m venv venv source venv/bin/activate
-
Install the required libraries:
pip install -r requirements.txt
-
Run the application:
streamlit run app.py
-
Your web browser will automatically open with the application running. Enter your Google API Key in the sidebar to begin.
The system's data flow is designed for maximum safety and efficiency, mirroring an industrial conveyor belt.
- Deconstruction: The input SRT file is precisely disassembled into two distinct components: Metadata (timestamps) and Data (dialogue text).
- Refinement: The Metadata is securely stored locally. Only the pure text data proceeds to the next stage, preventing any possibility of metadata corruption by the AI.
- Batch Processing: The text is grouped into manageable chunks according to the user-defined size. These chunks are then formatted into a strict JSON structure using the ID Anchoring protocol.
- Reassembly: Once the AI returns the translated JSON, the system validates its integrity, re-sorts it by ID, and meticulously reassembles it with the original, untouched Metadata to produce the final, perfectly synchronized subtitle file.
Our protocol neutralizes the LLM's tendency to alter text structure by combining a logical data structure with a strict API-level command.
-
ID Anchoring: Enforcing Structural Invariance
Instead of sending a simple list of strings, which the AI might interpret as a single, malleable block of text, we send an array of objects. Each object is "anchored" with a unique, sequential
id.Data Structure Sent to AI:
[ {"id": 0, "text": "Line 1 text."}, {"id": 1, "text": "Line 2 text."}, {"id": 2, "text": "Line 3 text."} ]This structure acts as a logical "shackle." The AI is instructed via the prompt to preserve the
idfor each object. This simple rule has profound implications:- Merging is impossible: The AI cannot merge line 1 and 2 into a single translated object without either destroying an ID (
id: 1) or creating an invalid structure. - Splitting is impossible: The AI cannot split line 3 into two translated objects without fabricating a new ID, which violates the instruction.
This forces a strict 1-to-1 mapping between the input and output objects at a structural level, regardless of the text content. Even if the AI reorders the objects in its response, we can reliably sort them back into the correct sequence using the immutable IDs.
- Merging is impossible: The AI cannot merge line 1 and 2 into a single translated object without either destroying an ID (
-
API-Level Forced JSON Mode: Guaranteeing Data Integrity
While ID Anchoring solves the structural mapping problem, it doesn't prevent the AI from returning a response that isn't valid JSON (e.g., by adding conversational text like
"Here is your translation: ..."). To eliminate this, we bypass prompt-level requests entirely.We configure the Gemini API call to set the
response_mime_typeparameter toapplication/json. This is not a suggestion; it is a system-level command to the API server. It contractually binds the server to return a response that is nothing but a syntactically perfect JSON object. This completely eradicates any possibility ofJSONDecodeErrorand makes the communication pipeline exceptionally robust.
- Core & Logic:
Python 3.13,Streamlit 1.51.0 - AI Engine & Communication:
google-generativeai,chardet - Packaging & Deployment:
PyInstaller 6.17.0,UPX 4.2.4
Google Gemini APIλ₯Ό κΈ°λ°μΌλ‘, μλ²½ν λ©νλ°μ΄ν° 보쑴과 λ²μμ μν΄ μ λ°νκ² μ€κ³λ λΉμ μ₯μ μλ§ λ²μκΈ°μ λλ€.
- π₯οΈ λΉμ μ₯μ & λΈλΌμ°μ κΈ°λ° UI: StreamlitμΌλ‘ μ μλμ΄ λͺ¨λ μ΅μ μΉ λΈλΌμ°μ μμ μ€νλλ©°, μ΄λ€ μ΄μ체μ μμλ κΉλνκ³ μ§κ΄μ μΈ μΈν°νμ΄μ€λ₯Ό μ 곡ν©λλ€.
- π μλ²½ν λ©νλ°μ΄ν° 보쑴: μμ€ν μ ν΅μ¬μλ 'μ»¨λ² μ΄μ΄ λ²¨νΈ μν€ν μ²'κ° μμ΅λλ€. μ΄ κ΅¬μ‘°λ μλ§ ν μ€νΈλ₯Ό νμμ€ν¬ν, μΈλ±μ€μ κ°μ λ©νλ°μ΄ν°λ‘λΆν° μΈκ³Όμ μΌλ‘ λΆλ¦¬ν©λλ€. μ€μ§ ν μ€νΈλ§ λ²μμ μν΄ μ μ‘λλ―λ‘, μκ° μ 보λ μ λ νΌμλμ§ μκ³ μλ²½ν λκΈ°νλ₯Ό μ μ§ν©λλ€.
- π κ²¬κ³ ν AI ν΅μ (ID μ΅μ»€λ§ νλ‘ν μ½): LLMμ΄ μμλ‘ μ€μ ν©μΉκ±°λ λλλ μΉλͺ μ μΈ 'κ°μ λΆμΌμΉ' λ¬Έμ λ₯Ό ν΄κ²°νμ΅λλ€. λͺ¨λ μ€μ κ³ μ IDλ‘ κ³ μ λμ΄, AIκ° μμ€μ λ²μ ν μ€νΈ κ°μ 1:1 ꡬ쑰μ λμμ μ μ§νλλ‘ κ°μ ν©λλ€. μ΄λ μ¬μ‘°λ¦½λ μλ§ νμΌμ΄ μ λ μμλμ§ μμμ 보μ₯ν©λλ€.
- π§ 컨ν
μ€νΈ-μΈμ μμ§ (κ°λ° μ§ν μ€): μμ€ν
μ "μ€μΉ΄μ°νΈ-리ν¬νΈ-μ£Όμ
" μν€ν
μ²λ₯Ό μ±ννμ¬, λ²μ μ μ€ν¬λ¦½νΈμ μ₯λ₯΄, ν€, μΈλ¬Ό κ΄κ³λ₯Ό 미리 λΆμν©λλ€. μ΄λ κ² μμ±λ '컨ν
μ€νΈ κ°μ΄λ'λ λͺ¨λ λ²μ μμ²μ μ£Όμ
λμ΄ μΌκ΄μ±κ³Ό ν€μ μ νμ±μ κ·Ήμ μΌλ‘ ν₯μμν΅λλ€.
- μ°Έκ³ : μ¬μΈ΅ λ¬Έλ§₯ λΆμμ μν νλ μμν¬λ λ§λ ¨λμμΌλ, μ€ν¬λ¦½νΈ μ 체μ κ±Έμ³ μλ²½ν μμ¬μ , κ°μ μ λ¬Έλ§₯μ λ¬μ±νλ κ²μ μ¬μ ν λμ μ μΈ κ³Όμ μ΄λ©° ν₯ν κ°μ μ ν΅μ¬ μμμ λλ€. νμ¬ κ΅¬νμ μλΉν νμ§ ν₯μμ μ 곡νμ§λ§, μμ§ μλ²½νμ§λ μμ΅λλ€.
- π μ€μκ° μ€ν λμ보λ: μκ°μ 그리λκ° κ° μ²ν¬μ μν(λκΈ°, μ²λ¦¬ μ€, μ±κ³΅, μ€ν¨)λ₯Ό μ€μκ°μΌλ‘ νμνλ©°, κ²½κ³Ό μκ°, νκ· μ²ν¬ μλ, μμ μλ£ μκ°μ 보μ¬μ£Όλ HUDκ° ν¨κ» μ 곡λ©λλ€.
- π§ κ³ κΈ μ μ΄ λ° νλ:
- μλ μ¬μλ & κΈ΄κΈ μ μ§: μ€ν¨ν μ²ν¬λ μ 체 νλ‘μΈμ€λ₯Ό λ€μ μμν νμ μμ΄ κ°λ³μ μΌλ‘ μ¬μλν μ μμ΅λλ€. μ μ μ μ§ λ²νΌμΌλ‘ μΈμ λ μ§ μμ μ μ€λ¨ν μ μμ΅λλ€.
- μΆλ‘ λ²ν·: AIλ₯Ό 'μ΅λ μΆλ‘ ' λͺ¨λλ‘ μ ννλ ν κΈμ λλ€. μλλ₯Ό ν¬μνλ λμ , λ―Έλ¬ν λμμ€μ λμ¬λ₯Ό μν΄ λ κΉκ³ λ¨κ³μ μΈ λΆμμ μννλλ‘ μ§μνμ¬ κ³ νμ§ λ²μμ μ λν©λλ€.
- μ²ν¬ ν¬κΈ° μ‘°μ : API νΈμΆλΉ μ μ‘λλ ν μ€νΈ μμ μ μ΄νλ μ¬λΌμ΄λλ‘, μ¬μ©μκ° μλμ μμ μ± μ¬μ΄μ κ· νμ λ§μΆ μ μμ΅λλ€.
λ‘컬 κ°λ° νκ²½μμ μ ν리μΌμ΄μ μ μ€ννλ €λ©΄ λ€μ λ¨κ³λ₯Ό λ°λ₯΄μΈμ.
- Python 3.9 μ΄μ
- Gemini APIκ° νμ±νλ Google API ν€. Google AI Studioμμ λ°κΈλ°μ μ μμ΅λλ€.
-
리ν¬μ§ν 리 ν΄λ‘ :
git clone https://github.com/your-repo/your-project.git cd your-project -
(κΆμ₯) κ°μνκ²½ μμ± λ° νμ±ν:
# Windows python -m venv venv .\venv\Scripts\activate # macOS/Linux python3 -m venv venv source venv/bin/activate
-
νμ λΌμ΄λΈλ¬λ¦¬ μ€μΉ:
pip install -r requirements.txt
-
μ ν리μΌμ΄μ μ€ν:
streamlit run app.py
-
μΉ λΈλΌμ°μ κ° μλμΌλ‘ μ΄λ¦¬λ©° μ ν리μΌμ΄μ μ΄ μ€νλ©λλ€. μ¬μ΄λλ°μ Google API ν€λ₯Ό μ λ ₯νμ¬ μμνμΈμ.
μμ€ν μ λ°μ΄ν° νλ¦μ μ°μ νμ₯μ μ»¨λ² μ΄μ΄ 벨νΈμ²λΌ, μ΅κ³ μ μμ μ±κ³Ό ν¨μ¨μ±μ μν΄ μ€κ³λμμ΅λλ€.
- λΆν΄: μ λ ₯λ SRT νμΌμ λ©νλ°μ΄ν°μ λ°μ΄ν°λΌλ λ κ°μ§ λ³κ°μ κ΅¬μ± μμλ‘ μ λ°νκ² λΆν΄λ©λλ€.
- μ μ : λ©νλ°μ΄ν°λ λ‘컬μ μμ νκ² λ³΄κ΄λ©λλ€. μ€μ§ μμν ν μ€νΈ λ°μ΄ν°λ§μ΄ λ€μ λ¨κ³λ‘ μ§νλμ΄, AIμ μν λ©νλ°μ΄ν° μ€μΌ κ°λ₯μ±μ μμ² μ°¨λ¨ν©λλ€.
- μΌκ΄ μ²λ¦¬: ν μ€νΈλ μ¬μ©μκ° μ μν ν¬κΈ°μ μ²ν¬λ‘ κ·Έλ£Ήνλ ν, ID μ΅μ»€λ§ νλ‘ν μ½μ μ¬μ©νμ¬ μ격ν JSON κ΅¬μ‘°λ‘ ν¬λ§·λ©λλ€.
- μ¬μ‘°λ¦½: AIκ° λ²μλ JSONμ λ°ννλ©΄, μμ€ν μ λ°μ΄ν° 무결μ±μ κ²μ¦νκ³ IDλ₯Ό κΈ°μ€μΌλ‘ μ¬μ λ ¬ν λ€, μλ³Έ κ·Έλλ‘ λ³΄μ‘΄λ λ©νλ°μ΄ν°μ κΌΌκΌΌνκ² μ¬κ²°ν©νμ¬ μλ²½νκ² λκΈ°νλ μ΅μ’ μλ§ νμΌμ μμ±ν©λλ€.
μ°λ¦¬μ νλ‘ν μ½μ λ Όλ¦¬μ λ°μ΄ν° ꡬ쑰μ μ격ν API λ 벨 λͺ λ Ήμ κ²°ν©νμ¬ ν μ€νΈ ꡬ쑰λ₯Ό λ³κ²½νλ €λ LLMμ κ²½ν₯μ 무λ ₯νν©λλ€.
-
ID μ΅μ»€λ§: ꡬ쑰μ λΆλ³μ± κ°μ
AIκ° μμ κ°λ₯ν λ¨μΌ ν μ€νΈ λΈλ‘μΌλ‘ ν΄μν μ μλ λ¨μν λ¬Έμμ΄ λ¦¬μ€νΈ λμ , μ°λ¦¬λ κ°μ²΄λ€μ λ°°μ΄μ μ μ‘ν©λλ€. κ° κ°μ²΄λ κ³ μ νκ³ μμ°¨μ μΈ
idλ‘ "κ³ μ "λ©λλ€.AIμ μ μ‘λλ λ°μ΄ν° ꡬ쑰:
[ {"id": 0, "text": "첫 λ²μ§Έ μ€ ν μ€νΈ."}, {"id": 1, "text": "λ λ²μ§Έ μ€ ν μ€νΈ."}, {"id": 2, "text": "μΈ λ²μ§Έ μ€ ν μ€νΈ."} ]μ΄ κ΅¬μ‘°λ λ Όλ¦¬μ 'μ‘±μ' μν μ ν©λλ€. AIλ ν둬ννΈλ₯Ό ν΅ν΄ κ° κ°μ²΄μ
idλ₯Ό 보쑴νλλ‘ μ§μλ°μ΅λλ€. μ΄ κ°λ¨ν κ·μΉμ λ€μκ³Ό κ°μ μ€λν κ²°κ³Όλ₯Ό λ³μ΅λλ€.- λ³ν© λΆκ°λ₯: AIλ
id: 1μ νκ΄΄νκ±°λ μ ν¨νμ§ μμ ꡬ쑰λ₯Ό λ§λ€μ§ μκ³ μλ 1λ²κ³Ό 2λ² μ€μ λ¨μΌ λ²μ κ°μ²΄λ‘ ν©μΉ μ μμ΅λλ€. - λΆν λΆκ°λ₯: AIλ μλ‘μ΄ IDλ₯Ό λ μ‘°νμ§ μκ³ μλ 3λ² μ€μ λ κ°μ λ²μ κ°μ²΄λ‘ λλ μ μμΌλ©°, μ΄λ μ§μ μ¬ν μλ°μ λλ€.
μ΄κ²μ ν μ€νΈ λ΄μ©κ³Ό κ΄κ³μμ΄ μ λ ₯κ³Ό μΆλ ₯ κ°μ²΄ κ°μ μ격ν 1:1 λ§€νμ ꡬ쑰μ μμ€μμ κ°μ ν©λλ€. AIκ° μλ΅μμ κ°μ²΄μ μμλ₯Ό λ€μλλΌλ, μ°λ¦¬λ λΆλ³μ IDλ₯Ό μ¬μ©νμ¬ νμ μ νν μμλ‘ μμ μ μΌλ‘ μ¬μ λ ¬ν μ μμ΅λλ€.
- λ³ν© λΆκ°λ₯: AIλ
-
API λ 벨 JSON κ°μ λͺ¨λ: λ°μ΄ν° λ¬΄κ²°μ± λ³΄μ₯
ID μ΅μ»€λ§μ΄ ꡬ쑰μ λ§€ν λ¬Έμ λ₯Ό ν΄κ²°νμ§λ§, AIκ° μ ν¨νμ§ μμ JSONμ λ°ννλ κ²(μ:
"λ²μ κ²°κ³Όμ λλ€: ..."μ κ°μ λν체 ν μ€νΈ μΆκ°)μ λ§μ§λ λͺ»ν©λλ€. μ΄λ₯Ό μ κ±°νκΈ° μν΄, μ°λ¦¬λ ν둬ννΈ μμ€μ μμ²μ μμ ν μ°νν©λλ€.Gemini API νΈμΆ μ
response_mime_typeλ§€κ°λ³μλ₯Όapplication/jsonμΌλ‘ μ€μ νλλ‘ κ΅¬μ±ν©λλ€. μ΄κ²μ μ μμ΄ μλλΌ API μλ² μ체μ λν μμ€ν λ 벨μ λͺ λ Ήμ λλ€. μ΄λ μλ²κ° λ¬Έλ²μ μΌλ‘ μλ²½ν JSON κ°μ²΄ μΈμλ μ무κ²λ λ°ννμ§ μλλ‘ κ³μ½μ μΌλ‘ ꡬμν©λλ€. μ΄λ‘μ¨JSONDecodeErrorμ κ°λ₯μ±μ΄ μλ²½νκ² μ κ±°λκ³ ν΅μ νμ΄νλΌμΈμ κ·Ήλλ‘ κ²¬κ³ ν΄μ§λλ€.
- μ½μ΄ & λ‘μ§:
Python 3.13,Streamlit 1.51.0 - AI μμ§ & ν΅μ :
google-generativeai,chardet - ν¨ν€μ§ & λ°°ν¬:
PyInstaller 6.17.0,UPX 4.2.4
This project is licensed under the MIT License.