Variable-Excluding Automatic Translation

A python program with a Gooey GUI to read, translate DOCX files and write PO files all in a few clicks.

Specially designed for documents that includes parameters starting with "ID*_*" or enclosed by @ , excluding them from the translation. Helpful to translate auto-fill templates, not changing any variable names so the parameters can still be easily recognized and applied.

The main steps followed by the code are:

Importing all the documents from the indicated path into a big TXT file that contains every character that appears in the documents.
Loading this TXT file and converting to a data frame for easier data processing.
Cleaning the initial data frame, deleting repeated phrases and empty or NA values.
Identify every word that starts with "ID*_*" or is enclosed by @ (indicating that these are variables) and add to a new data frame.
Assign to every registered variable a randomly generated unique 4-digit code.
Replace every variable with it's assigned code in the data frame that contains every unique phrase from the document list.
Using the mtranslate library, translate every phrase from the data frame to the desired language.
Referring to the data frame containing the variables and associated codes, replace every code with the original variable name.
Using the Levenshtein python library, apply the Levenshtein distance to the phrases and create a TXT files that collects the results.
- This helps identify phrases that have very little differences in characters.
- In this case, it is indicated to identify phrases that have a difference of less than 3 characters.
Write a TXT file with the format:

    # Comments
    msgid "original word"
    msgstr "translated word"

Create a copy of this file and change the extension from TXT to PO

In case that the document's variables are not unified under the "ID*" prefix, there is a code provided to change the prefixes (id*, Tx, @ ) to the desired one. There are two versions of this code, one to be applied to DOCX files and one for TXT files. These can be found as Variable_Unification_DOCX.py and Variable_Unification_TXT.py in the "Desarrollo" folder.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Translation_GUI.py		Translation_GUI.py
Translation_GUI.spec		Translation_GUI.spec
Translation_def.py		Translation_def.py
config.json		config.json
icon.ico		icon.ico
pot_translations.py		pot_translations.py
program_icon.ico		program_icon.ico
requirements.txt		requirements.txt
stt.ico		stt.ico
traducciones_pot.py		traducciones_pot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variable-Excluding Automatic Translation

About

Releases

Packages

Languages

License

asimantobar/Translation

Folders and files

Latest commit

History

Repository files navigation

Variable-Excluding Automatic Translation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages