A python program with a Gooey GUI to read, translate DOCX files and write PO files all in a few clicks.
Specially designed for documents that includes parameters starting with "ID*_*" or enclosed by @ , excluding them from the translation. Helpful to translate auto-fill templates, not changing any variable names so the parameters can still be easily recognized and applied.
The main steps followed by the code are:
- Importing all the documents from the indicated path into a big TXT file that contains every character that appears in the documents.
- Loading this TXT file and converting to a data frame for easier data processing.
- Cleaning the initial data frame, deleting repeated phrases and empty or NA values.
- Identify every word that starts with "ID*_*" or is enclosed by @ (indicating that these are variables) and add to a new data frame.
- Assign to every registered variable a randomly generated unique 4-digit code.
- Replace every variable with it's assigned code in the data frame that contains every unique phrase from the document list.
- Using the mtranslate library, translate every phrase from the data frame to the desired language.
- Referring to the data frame containing the variables and associated codes, replace every code with the original variable name.
- Using the Levenshtein python library, apply the Levenshtein distance to the phrases and create a TXT files that collects the results.
- This helps identify phrases that have very little differences in characters.
- In this case, it is indicated to identify phrases that have a difference of less than 3 characters.
- Write a TXT file with the format:
# Comments
msgid "original word"
msgstr "translated word"
- Create a copy of this file and change the extension from TXT to PO
In case that the document's variables are not unified under the "ID*" prefix, there is a code provided to change the prefixes (id*, Tx, @ ) to the desired one. There are two versions of this code, one to be applied to DOCX files and one for TXT files. These can be found as Variable_Unification_DOCX.py and Variable_Unification_TXT.py in the "Desarrollo" folder.