Skip to content

Implementation of Baseline for Scene Text-to-Scene Text Translation

License

Notifications You must be signed in to change notification settings

Bhashini-IITJ/visualTranslation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation (Official code)

Implementation of Baseline for Scene Text-to-Scene Text Translation

Release updates:

  • [September 25, 2024] First Public Release (supports training and inference on datasets used in the paper using our best performing baseline reported in the paper. Currently, we support precomputed scene text detection and recognition. In future release, we plan to integrate scene text detection and recognition as well).

Inference on datasets used

This release only supports training and inference on datasets used in the paper, i.e., BSTD and ICDAR 2013, and using precomputed scene text detection and recognition. Please follow the below instructions for inference on our VT-Real dataset. For detailed information for specific tasks check the training section

  1. Clone the repo and set up the required dependencies

    git clone https://github.com/Bhashini-IITJ/visualTranslation.git
    source ./setup.sh
  2. Download the input VT-Real images (which are to be translated) (download details in the Project page) and put them in folders source_eng (ICDAR images) and source_hin (BSTD images) in the project directory.

  3. Download the translation checkpoints eng_hin.model and hin_eng.model and put them in a folder named model inside the project directory.

  4. We provide precomputed/oracle word-level bounding boxes as json files. (In future release, we plan to integrate scene text detection and recognition implementation to our pipeline). Download these json files from the below table, rename them as engBB.json and hinBB.json for English and Hindi source language datasets, respectively. Then, keep them in the project directory.

Source Language Word Bounding Boxes
Eng json file for precomputed word bounding boxes
Hin json file for oracle word bounding boxes
  1. Then, run the following commands to obtain visual translation using our best performing baseline. In both cases a new folder named output will be created and the translated images will be saved in it.

Eng → Hin

source ./infer.sh -i source_eng -o output -f engBB.json --de

Hin → Eng

Change the checkpoint path in cfg.py file to model/hin_eng.model

source ./infer.sh -i source_hin  -o output -f hinBB.json --de --hin_eng

Training

Dataset generation

The dataset generation script is designed for ImageMagick v6 but can also work with ImageMagick v7, although you may encounter several warnings. The dataset can be generated for either English-to-Hindi (eng-hin) or Hindi-to-English (hin-eng) translations.

Setup Instructions:

  1. Download this folder and add it to your project directory.
  2. Unzip all the files within the folder.
  3. Install the fonts located in the devanagari.zip file.

Generating the Dataset:

To generate the dataset, run the following command:

./dataset_gen.sh [ --num_workers <number of loops> --per_worker <number of samples per loop> --hin_eng]

Command Options: --num_workers: Specifies the number of workers for dataset generation. Default: 20. --per_worker: Specifies the number of samples per loop. Default: 3000. --hin_eng: Generates a Hindi-to-English (hin-eng) dataset. If not specified, the dataset will be generated for English-to-Hindi (eng-hin). Note: To generate a dataset for other language pairs, modify the commands in data_gen.py accordingly.

Training SRNet++

SRNet++ can be trained with the following command:

conda activate srnet_plus_2
python train_o_t.py

change the path of 'data_dir' parameter in cfg.py file if you are using dataset with different path than default.

SRNet++ can be infered with following command lines:

conda activate srnet_plus_2
python generate_o_t.py

please change the path according to your use case. The inputs for the inferece are i_s and i_t. Example given below.

i_s i_t

Warning and troubleshooting

  • please make sure that imagemagick support png format after the setup.
  • Data generation code is written for imagemagickv6. It would work for imagemagickv7 but you will have a lots of warnings.

Bibtex (how to cite us)

@InProceedings{vistransICPR2024,
    author    = {Vaidya, Shreyas and Sharma, Arvind Kumar and Gatti, Prajwal and Mishra, Anand},
    title     = {Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation},
    booktitle = {ICPR},
    year      = {2024},
}

Acknowledgements

  1. SRNet
  2. Indic Scene Text Rendering
  3. Scene text eraser
  4. Facebook-m2m
  5. IndicTrans2

Contact info

In case of any issue/doubt, please raise Github issue and/or write to us: Arvind Kumar Sharma - arvindji0201@gmail.com.

About

Implementation of Baseline for Scene Text-to-Scene Text Translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published