This GitHub page contains my implementation for DIBCO challenges in Document Image Binarization. The implemented code uses the BCDU-Net model for learning the binarization process on the DIBCO series. The evaluation results show the BCDU-net chan achieves the best performance on DIBCO challenges. If this code helps with your research please consider citing the following paper:
R. Azad, et. all, "Bi-Directional ConvLSTM U-Net with Densely Connected Convolutions ", ICCV, 2019, download link.
- December 3, 2019: First release (Complete implementation for DIBCO Series, years 2009 to 2017 datasets added.). Other datasets can be added easily.
This code has been implemented in python language using Keras library with TensorFlow backend and tested in ubuntu OS, though should be compatible with the related environment. following Environment and Library needed to run the code:
- Python 3
- Keras - tensorflow backend
For training deep model for each DIBCO year, follow the bellow steps:
1- Download the DIBCO datasets from this link and extract it. We included DIBCO datasets from 2009 to 2017. It is easy to add DIBCO 2018, 2019 or other datasets, just need to revise the utils code.
2- Run Prepare_DIBCO.py
for data preparation and dividing data to train and test sets. Please note that this code will consider whole the samples of one particular year as a test set and the rest of the years for the training set. It is the common data division which uses in the DIBCO challenge.
3- Run Train_DIBCO.py
for training BCDU-Net model using trainng and validation (20% of the training samples) sets. The model will be training for 100 epochs and it will save the best weights for the validation set.
4- For performance calculation and producing binarization result, run Evaluate.py
. It will represent performance measures and will save related figures and results in output
folder.
We train the model using patches that we extract from the training set. Also for test image binarization, we apply patch-based overlaping binarization. If you want to train and evaluate the model of any particular year just determine the test year (parameter Test_year = 2016
) when runing Prepare_DIBCO.py
and Evaluate.py
.
For evaluating the performance of the BCDU-Net model on the DIBCO series, we followed the setting used in SAE
. In [1] the authors provided the experimental results on only DIBCO series 2014 and 2016. To do so, they have considered the whole samples of one particular year (for example 2014 or 2016) as a test set and rest of the samples from other years as a train set. We used the same setting for reporting our results. In bellow, the results of the proposed approach illustrated in terms of F-measure.
Methods | DIBCO 2014 | DIBCO 2016 |
---|---|---|
Otsu | 91.56 | 73.79 |
Niblack | 22.26 | 16.7 |
Sauvola | 77.08 | 82.00 |
Wolf et al. | 90.47 | 81.76 |
Gatos et al. | 91.97 | 74.97 |
Sauvola MS | 87.86 | 65.04 |
Su et al. | 95.14 | 90.27 |
Howe | 90.00 | 80.64 |
Kliger and Tal | 95.00 | 90.48 |
CNN | 81.23 | 54.58 |
SAE | 89.12 | 85.27 |
R. Azad BCDU-Net | 97.88 | 98.96 |
You can download the learned weights for DIBCO series, which trained on DIBCO 2009-2017 (except 2016) samples. Please note that the trained model can be used for other years too.
Test year | Learned weights |
---|---|
DIBCO 2016 | Model Weights |
All implementation done by Reza Azad. For any query please contact us for more information.
rezazad68@gmail.com