Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation (Official code)

Implementation of Baseline for Scene Text-to-Scene Text Translation

Release updates:

[September 25, 2024] First Public Release (supports training and inference on datasets used in the paper using our best performing baseline reported in the paper. Currently, we support precomputed scene text detection and recognition. In future release, we plan to integrate scene text detection and recognition as well).

Inference on datasets used

This release only supports training and inference on datasets used in the paper, i.e., BSTD and ICDAR 2013, and using precomputed scene text detection and recognition. Please follow the below instructions for inference on our VT-Real dataset. For detailed information for specific tasks check the training section

Clone the repo and set up the required dependencies

git clone https://github.com/Bhashini-IITJ/visualTranslation.git
source ./setup.sh

Download the input VT-Real images (which are to be translated) (download details in the Project page) and put them in folders source_eng (ICDAR images) and source_hin (BSTD images) in the project directory.
Download the translation checkpoints eng_hin.model and hin_eng.model and put them in a folder named model inside the project directory.
We provide precomputed/oracle word-level bounding boxes as json files. (In future release, we plan to integrate scene text detection and recognition implementation to our pipeline). Download these json files from the below table, rename them as engBB.json and hinBB.json for English and Hindi source language datasets, respectively. Then, keep them in the project directory.

Source Language	Word Bounding Boxes
Eng	json file for precomputed word bounding boxes
Hin	json file for oracle word bounding boxes

Then, run the following commands to obtain visual translation using our best performing baseline. In both cases a new folder named output will be created and the translated images will be saved in it.

Eng → Hin

source ./infer.sh -i source_eng -o output -f engBB.json --de

Hin → Eng

Change the checkpoint path in cfg.py file to model/hin_eng.model

source ./infer.sh -i source_hin  -o output -f hinBB.json --de --hin_eng

Training

Dataset generation

The dataset generation script is designed for ImageMagick v6 but can also work with ImageMagick v7, although you may encounter several warnings. The dataset can be generated for either English-to-Hindi (eng-hin) or Hindi-to-English (hin-eng) translations.

Setup Instructions:

Download this folder and add it to your project directory.
Unzip all the files within the folder.
Install the fonts located in the devanagari.zip file.

Generating the Dataset:

To generate the dataset, run the following command:

./dataset_gen.sh [ --num_workers <number of loops> --per_worker <number of samples per loop> --hin_eng]

Command Options: --num_workers: Specifies the number of workers for dataset generation. Default: 20. --per_worker: Specifies the number of samples per loop. Default: 3000. --hin_eng: Generates a Hindi-to-English (hin-eng) dataset. If not specified, the dataset will be generated for English-to-Hindi (eng-hin). Note: To generate a dataset for other language pairs, modify the commands in data_gen.py accordingly.

Training SRNet++

SRNet++ can be trained with the following command:

conda activate srnet_plus_2
python train_o_t.py

change the path of 'data_dir' parameter in cfg.py file if you are using dataset with different path than default.

SRNet++ can be infered with following command lines:

conda activate srnet_plus_2
python generate_o_t.py

please change the path according to your use case. The inputs for the inferece are i_s and i_t. Example given below.

i_s	i_t

Warning and troubleshooting

please make sure that imagemagick support png format after the setup.
Data generation code is written for imagemagickv6. It would work for imagemagickv7 but you will have a lots of warnings.

Bibtex (how to cite us)

@InProceedings{vistransICPR2024,
    author    = {Vaidya, Shreyas and Sharma, Arvind Kumar and Gatti, Prajwal and Mishra, Anand},
    title     = {Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation},
    booktitle = {ICPR},
    year      = {2024},
}

Acknowledgements

Contact info

In case of any issue/doubt, please raise Github issue and/or write to us: Arvind Kumar Sharma - arvindji0201@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
assets		assets
i_t_utils		i_t_utils
LICENSE		LICENSE
README.md		README.md
blend_o_t_bg.py		blend_o_t_bg.py
cfg.py		cfg.py
colab_inference.ipynb		colab_inference.ipynb
create_final_images.py		create_final_images.py
data_gen.py		data_gen.py
datagen.py		datagen.py
dataset_gen.sh		dataset_gen.sh
detect_para.py		detect_para.py
exclude_key_words.py		exclude_key_words.py
form_para_info.py		form_para_info.py
form_word_crops.py		form_word_crops.py
format_file_structure.py		format_file_structure.py
generate_crops.py		generate_crops.py
generate_i_t.py		generate_i_t.py
generate_o_t.py		generate_o_t.py
infer.sh		infer.sh
loss.py		loss.py
make_bg.py		make_bg.py
make_masks.py		make_masks.py
make_output_base.py		make_output_base.py
model_o_t_gen.py		model_o_t_gen.py
modify_crops.py		modify_crops.py
render_Indian_language_scenetext.py		render_Indian_language_scenetext.py
scene_text_eraser.py		scene_text_eraser.py
setup.sh		setup.sh
skeletonize.py		skeletonize.py
srnet_plus2.txt		srnet_plus2.txt
train_o_t.py		train_o_t.py
translate.py		translate.py
translate_de.py		translate_de.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation (Official code)

Release updates:

Inference on datasets used

Eng → Hin

Hin → Eng

Training

Dataset generation

Setup Instructions:

Generating the Dataset:

Training SRNet++

Warning and troubleshooting

Bibtex (how to cite us)

Acknowledgements

Contact info

About

Releases

Packages

Contributors 3

Languages

License

Bhashini-IITJ/visualTranslation

Folders and files

Latest commit

History

Repository files navigation

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation (Official code)

Release updates:

Inference on datasets used

Eng → Hin

Hin → Eng

Training

Dataset generation

Setup Instructions:

Generating the Dataset:

Training SRNet++

Warning and troubleshooting

Bibtex (how to cite us)

Acknowledgements

Contact info

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages