Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation (Official code)
- [September 25, 2024] First Public Release (supports training and inference on datasets used in the paper using our best performing baseline reported in the paper. Currently, we support precomputed scene text detection and recognition. In future release, we plan to integrate scene text detection and recognition as well).
This release only supports training and inference on datasets used in the paper, i.e., BSTD and ICDAR 2013, and using precomputed scene text detection and recognition. Please follow the below instructions for inference on our VT-Real dataset. For detailed information for specific tasks check the training section
-
Clone the repo and set up the required dependencies
git clone https://github.com/Bhashini-IITJ/visualTranslation.git source ./setup.sh
-
Download the input VT-Real images (which are to be translated) (download details in the Project page) and put them in folders source_eng (ICDAR images) and source_hin (BSTD images) in the project directory.
-
Download the translation checkpoints eng_hin.model and hin_eng.model and put them in a folder named model inside the project directory.
-
We provide precomputed/oracle word-level bounding boxes as json files. (In future release, we plan to integrate scene text detection and recognition implementation to our pipeline). Download these json files from the below table, rename them as engBB.json and hinBB.json for English and Hindi source language datasets, respectively. Then, keep them in the project directory.
Source Language | Word Bounding Boxes |
---|---|
Eng | json file for precomputed word bounding boxes |
Hin | json file for oracle word bounding boxes |
- Then, run the following commands to obtain visual translation using our best performing baseline. In both cases a new folder named output will be created and the translated images will be saved in it.
source ./infer.sh -i source_eng -o output -f engBB.json --de
Change the checkpoint path in cfg.py file to model/hin_eng.model
source ./infer.sh -i source_hin -o output -f hinBB.json --de --hin_eng
The dataset generation script is designed for ImageMagick v6 but can also work with ImageMagick v7, although you may encounter several warnings. The dataset can be generated for either English-to-Hindi (eng-hin) or Hindi-to-English (hin-eng) translations.
- Download this folder and add it to your project directory.
- Unzip all the files within the folder.
- Install the fonts located in the devanagari.zip file.
To generate the dataset, run the following command:
./dataset_gen.sh [ --num_workers <number of loops> --per_worker <number of samples per loop> --hin_eng]
Command Options: --num_workers: Specifies the number of workers for dataset generation. Default: 20. --per_worker: Specifies the number of samples per loop. Default: 3000. --hin_eng: Generates a Hindi-to-English (hin-eng) dataset. If not specified, the dataset will be generated for English-to-Hindi (eng-hin). Note: To generate a dataset for other language pairs, modify the commands in data_gen.py accordingly.
SRNet++ can be trained with the following command:
conda activate srnet_plus_2
python train_o_t.py
change the path of 'data_dir' parameter in cfg.py file if you are using dataset with different path than default.
SRNet++ can be infered with following command lines:
conda activate srnet_plus_2
python generate_o_t.py
please change the path according to your use case. The inputs for the inferece are i_s and i_t. Example given below.
i_s | i_t |
---|---|
- please make sure that imagemagick support png format after the setup.
- Data generation code is written for imagemagickv6. It would work for imagemagickv7 but you will have a lots of warnings.
@InProceedings{vistransICPR2024,
author = {Vaidya, Shreyas and Sharma, Arvind Kumar and Gatti, Prajwal and Mishra, Anand},
title = {Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation},
booktitle = {ICPR},
year = {2024},
}
In case of any issue/doubt, please raise Github issue and/or write to us: Arvind Kumar Sharma - arvindji0201@gmail.com.