Skip to content

Latest commit

 

History

History
227 lines (188 loc) · 9.41 KB

README.md

File metadata and controls

227 lines (188 loc) · 9.41 KB

MRN: Multiplexed Routing Network
for Incremental Multilingual Text Recognition

ICCV 2023 ArXiv preprint Blog LICENSE

Method |IMLTR Dataset | Getting Started | Citation

It started as code for the paper:

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition (Accepted by ICCV 2023)

This project is a toolkit for the novel scenario of Incremental Multilingual Text Recognition (IMLTR), the project supports many incremental learning methods and proposes a more applicable method for IMLTR: Multiplexed Routing Network (MRN) and the corresponding dataset. The project provides an efficient framework to assist in developing new methods and analyzing existing ones under the IMLTR task, and we hope it will advance the IMLTR community.

image

Methods

Incremental Learning Methods

  • Base: Baseline method which simply updates parameters on new tasks.
  • Joint: Bound method: data for all tasks are trained at once, an upper bound for the method
    (Joint_mix means all tasks data mixed in batch, Joint_loader means the consistent proportion of data from each task in a batch)
  • EWC [PNAS2017]: Overcoming catastrophic forgetting in neural networks
  • LwF [ECCV2016]: Learning without Forgetting
  • WA [CVPR2020]: Maintaining Discrimination and Fairness in Class Incremental Learning
  • DER [CVPR2021]: DER: Dynamically Expandable Representation for Class Incremental Learning
  • MRN [ICCV2023]: MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

you can change config config/crnn_mrn.py for different il methods or setting.

common=dict(
    il="mrn",  # joint_mix | joint_loader | base | lwf | wa | ewc | der  | mrn
    memory="random",  # None | random
    memory_num=2000,
    start_task = 0  # checkpoint start
)

Text Recognition Methods

  • CRNN [TPAMI2017]: An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
  • TRBA [ICCV2019]: What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
  • SVTR [IJCAI2022]: SVTR: Scene Text Recognition with a Single Visual Model

you can change config config/crnn_mrn.py for different text recognition modules or setting.

""" Model Architecture """
common=dict(
    batch_max_length = 25,
    imgH = 32,
    imgW = 256,
)
model=dict(
    model_name="TRBA",
    Transformation = "TPS",      #None TPS
    FeatureExtraction = "ResNet",    #VGG ResNet SVTR
    SequenceModeling = "BiLSTM",  #None BiLSTM
    Prediction = "Attn",           #CTC Attn
    num_fiducial=20,
    input_channel=4,
    output_channel=512,
    hidden_size=256,
)

IMLTR Dataset

The Dataset can be downloaded from BaiduNetdisk(passwd:c07h).

dataset
├── MLT17_IL
│   ├── test_2017
│   ├── train_2017
├── MLT19_IL
│   ├── test_2019
│   ├── train_2019

Incremental MLT17: MLT17 has 68,613 training instances and 16,255 validation instances, which are from 6 scripts and 9 languages: Chinese, Japanese, Korean, Bangla, Arabic, Italian, English, French, and German. The last four use Latin script. Incremental MLT17 use the validation set for test due to the unavailability of test data. Tasks are split by scripts and modeled sequentially. Special symbols are discarded at the preprocessing step as with no linguistic meaning.

Incremental MLT19: MLT19 has 89,177 text instances coming from 7 scripts. Since the inaccessibility of test set, Incremental MLT19 randomly split the training instances to 9:1 script-by-script, for model training and test. To be consistent with Incremental MLT17 dataset, we discard the Hindi script and also special symbols. Statistics of the two datasets are shown in the following.

Dataset Categories
Task1 Task2 Task3 Task4 Task5 Task6
Chinese Latin Japanese Korean Arabic Bangla
MLT171 Train Instance 2687 47411 4609 5631 3711 3237
Test Instance 529 11073 1350 1230 983 713
Train Class 1895 325 1620 1124 73 112
MLT192 Train Instance 2897 52921 5324 6107 4230 3542
Test Instance 322 5882 590 679 470 393
Train Class 2086 220 1728 1160 73 102

Getting Started

Dependency

  • This work was tested with PyTorch 1.6.0, CUDA 10.1 and python 3.6.
conda create -n mrn python=3.7 -y
conda activate mrn
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
  • requirements :
pip3 install lmdb pillow torchvision nltk natsort fire tensorboard tqdm opencv-python einops timm mmcv shapely scipy
pip3 install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.1/index.html

Training

python3 tiny_train.py --config=config/crnn_mrn.py --exp_name CRNN_real

Arguments

tiny_train.py (as a default, evaluate trained model on IMLTR datasets at the end of training.

  • --select_data: folder path to training lmdb datasets.
    [" ../dataset/MLT17_IL/train_2017", "../dataset/MLT19_IL/train_2019"]
  • --valid_datas: folder path to testing lmdb dataset.
    [" ../dataset/MLT17_IL/test_2017", "../dataset/MLT19_IL/test_2019"]
  • --batch_ratio: assign ratio for each selected data in the batch. default is '1 / number of datasets'.
  • --Aug: whether to use augmentation |None|Blur|Crop|Rot|

Config Detail

For detailed configuration modifications please use the config file config/crnn_mrn.py

common=dict(
    exp_name="TRBA_MRN",  # Where to store logs and models
    il="mrn",  # joint_mix | joint_loader | base | lwf | wa | ewc | der  | mrn
    memory="random",  # None | random
    memory_num=2000,
    batch_max_length = 25,
    imgH = 32,
    imgW = 256,
    manual_seed=111,
    start_task = 0
)

""" Model Architecture """
model=dict(
    model_name="TRBA",
    Transformation = "TPS",      #None TPS
    FeatureExtraction = "ResNet",    #VGG ResNet
    SequenceModeling = "BiLSTM",  #None BiLSTM
    Prediction = "Attn",           #CTC Attn
    num_fiducial=20,
    input_channel=4,
    output_channel=512,
    hidden_size=256,
)



""" Optimizer """
optimizer=dict(
    schedule="super", #default is super for super convergence, 1 for None, [0.6, 0.8] for the same setting with ASTER
    optimizer="adam",
    lr=0.0005,
    sgd_momentum=0.9,
    sgd_weight_decay=0.000001,
    milestones=[2000,4000],
    lrate_decay=0.1,
    rho=0.95,
    eps=1e-8,
    lr_drop_rate=0.1
)


""" Data processing """
train = dict(
    saved_model="",  # "path to model to continue training"
    Aug="None",  # |None|Blur|Crop|Rot|ABINet
    workers=4,
    lan_list=["Chinese","Latin","Japanese", "Korean", "Arabic", "Bangla"],
    valid_datas=[
                 "../dataset/MLT17_IL/test_2017",
                 "../dataset/MLT19_IL/test_2019"
                 ],
    select_data=[
                 "../dataset/MLT17_IL/train_2017",
                 "../dataset/MLT19_IL/train_2019"
                 ],
    batch_ratio="0.5-0.5",
    total_data_usage_ratio="1.0",
    NED=True,
    batch_size=256,
    num_iter=10000,
    val_interval=5000,
    log_multiple_test=None,
    grad_clip=5,
)

Data Analysis

The experimental results of each task are recorded in data_any.txt and can be used for analysis of the data.

Acknowledgements

This implementation has been based on these repositories:

Citation

Please consider citing this work in your publications if it helps your research.

@article{zheng2023mrn,
  title={MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition},
  author={Zheng, Tianlun and Chen, Zhineng and Huang, BingChen and Zhang, Wei and Jiang, Yu-Gang},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

License

This project is released under the Apache 2.0 license.

Footnotes

  1. Nayef, N., et al. (2017). MLT 2017.

  2. Nayef, N., et al. (2019). MLT 2019.