This repository is the official implementations for the following paper:
Xiang Zhang, Yongwen Su, Subarna Tripathi, and Zhuowen Tu, CVPR 2022
We use the following environment in our experiments. It's recommended to install the dependencies via Anaconda
- CUDA 11.3
- Python 3.8
- PyTorch 1.10.1
- Official Pre-Built Detectron2
Please refer to the Installation section of AdelaiDet: README.md.
If you have not installed Detectron2, following the official guide: INSTALL.md.
After that, build this repository with
python setup.py build develop
Please download TotalText, CTW1500, MLT, and CurvedSynText150k according to the guide provided by AdelaiDet: README.md.
ICDAR2015 dataset can be download via link.
Extract all the datasets and make sure you organize them as follows
- datasets
| - CTW1500
| | - annotations
| | - ctwtest_text_image
| | - ctwtrain_text_image
| - totaltext (or icdar2015)
| | - test_images
| | - train_images
| | - test.json
| | - train.json
| - mlt2017 (or syntext1, syntext2)
| - annotations
| - images
After that, download polygonal annotations, along with evaluation files and extract them under datasets
folder.
You can try to visualize the predictions of the network using the following command:
python demo/demo.py --config-file <PATH_TO_CONFIG_FILE> --input <FOLDER_TO_INTPUT_IMAGES> --output <OUTPUT_FOLDER> --opts MODEL.WEIGHTS <PATH_TO_MODEL_FILE> MODEL.TRANSFORMER.INFERENCE_TH_TEST 0.3
You may want to adjust INFERENCE_TH_TEST
to filter out predictions with lower scores.
You can train from scratch or finetune the model by putting pretrained weights in weights
folder.
Example commands:
python tools/train_net.py --config-file <PATH_TO_CONFIG_FILE> --num-gpus 8
All configuration files can be found in configs/TESTR
, excluding those files named Base-xxxx.yaml
.
TESTR_R_50.yaml
is the config for TESTR-Bezier, while TESTR_R_50_Polygon.yaml
is for TESTR-Polygon.
python tools/train_net.py --config-file <PATH_TO_CONFIG_FILE> --eval-only MODEL.WEIGHTS <PATH_TO_MODEL_FILE>
Dataset | Annotation Type | Lexicon | Det-P | Det-R | Det-F | E2E-P | E2E-R | E2E-F | Link |
---|---|---|---|---|---|---|---|---|---|
Pretrain | Bezier | None | 88.87 | 76.47 | 82.20 | 63.58 | 56.92 | 60.06 | OneDrive |
Polygonal | None | 88.18 | 77.51 | 82.50 | 66.19 | 61.14 | 63.57 | OneDrive | |
TotalText | Bezier | None | 92.83 | 83.65 | 88.00 | 74.26 | 69.05 | 71.56 | OneDrive |
Full | - | - | - | 86.42 | 80.35 | 83.28 | |||
Polygonal | None | 93.36 | 81.35 | 86.94 | 76.85 | 69.98 | 73.25 | OneDrive | |
Full | - | - | - | 88.00 | 80.13 | 83.88 | |||
CTW1500 | Bezier | None | 89.71 | 83.07 | 86.27 | 55.44 | 51.34 | 53.31 | OneDrive |
Full | - | - | - | 83.05 | 76.90 | 79.85 | |||
Polygonal | None | 92.04 | 82.63 | 87.08 | 59.14 | 53.09 | 55.95 | OneDrive | |
Full | - | - | - | 86.16 | 77.34 | 81.51 | |||
ICDAR15 | Polygonal | None | 90.31 | 89.70 | 90.00 | 65.49 | 65.05 | 65.27 | OneDrive |
Strong | - | - | - | 87.11 | 83.29 | 85.16 | |||
Weak | - | - | - | 80.36 | 78.38 | 79.36 | |||
Generic | - | - | - | 73.82 | 73.33 | 73.57 |
The Lite
models only use the image feature from the last stage of ResNet.
Method | Annotation Type | Lexicon | Det-P | Det-R | Det-F | E2E-P | E2E-R | E2E-F | Link |
---|---|---|---|---|---|---|---|---|---|
Pretrain (Lite) | Polygonal | None | 90.28 | 72.58 | 80.47 | 59.49 | 50.22 | 54.46 | OneDrive |
TotalText (Lite) | Polygonal | None | 92.16 | 79.09 | 85.12 | 66.42 | 59.06 | 62.52 | OneDrive |
@InProceedings{Zhang_2022_CVPR,
author = {Zhang, Xiang and Su, Yongwen and Tripathi, Subarna and Tu, Zhuowen},
title = {Text Spotting Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {9519-9528}
}
This repository is released under the Apache License 2.0. License can be found in LICENSE file.
Thanks to AdelaiDet for a standardized training and inference framework, and Deformable-DETR for the implementation of multi-scale deformable cross-attention.