This repository provides a Python script for translating .docx files using OpenAI's GPT(-4) API.
- Supports translation between any pair of languages (configurable in the script).
- Handles large texts by splitting them into manageable chunks that fit OpenAI's token limits.
- Maintains contextual continuity by incorporating previous translations and source text into the translation process.
- Allows the use of a sample translation file to guide translation tone and style.
- The specific instructions can be modified. (The current instructions focuse on formal equivalence, aiming to preserve the original meaning, style, and structure of the text.)
- Reads input from
.docxfiles and writes translated output to.docxfiles while preserving paragraph structure. - Includes error handling and retry mechanisms for API calls.
- Python 3.7 or higher.
- Required libraries:
openaipython-docxtiktoken
- A valid OpenAI API key.
To install the required libraries, run:
pip install openai python-docx tiktoken
## Setup
### Configuration File
Create a `config.json` file in the same directory as the script and add your OpenAI API key:
```json
{
"OPENAI_API_KEY": "your-api-key-here"
}- Prepare the
.docxfile you want to translate. - Optionally, create a
.docxfile with sample translations to guide the translation style.
-
Open the script and modify the
main()function call:main('input.docx', 'output.docx', sample_translation_file='sample_translation.docx')
- Replace
input.docxwith the path to your input file. - Replace
output.docxwith the desired output file path. - Optionally, specify a sample translation file.
- Replace
-
Run the script; e.g. in bash:
python openai_translator.py
-
The translated document will be saved as the specified output file.