Skip to content

I implemented a detection algorithm with a classification data set that does not have annotation information for the bounding box. Based on resnet50 network, I implemented text detector using class activation mapping method.

Notifications You must be signed in to change notification settings

kapitsa2811/Weakly-Supervised-Text-Detection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weakly Supervised Text Detector

I am implementing a detection algorithm with a classification data set that does not have annotation information for the bounding box. I used the class activation mapping proposed in Learning Deep Features for Discriminative Localization.

Requirements

  • python 3.6
  • tensorflow 1.2.1
  • keras 2.1.1
  • opencv 3.3.0

Usage

The procedure to build detector is as follows:

1. Fine Tuning (1_train.py)

Pre-trained weight files are stored at https://drive.google.com/drive/folders/1kj398ZW3zwk-KMDA0_Y5tq6Gr09oFDbZ. This file allows you to skip the fine tuning process.

  • First build a binary classifier using the pretrained resnet50 structure. As with the structure in the Learning Deep Features for Discriminative Localization, I removed all layers after the activation layer with spatial resoultion of 14x14, and added a convolution layer of 3x3 size instead. The last few layers of the network have the following structure.
activation_40 (Activation)      (None, 14, 14, 1024) 0           add_13[0][0]                     
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 14, 14, 1024) 9438208     activation_40[0][0]              
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 14, 14, 1024) 4096        conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_50 (Activation)      (None, 14, 14, 1024) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
cam_average_pooling (AveragePoo (None, 1, 1, 1024)   0           activation_50[0][0]              
__________________________________________________________________________________________________
flatten_2 (Flatten)             (None, 1024)         0           cam_average_pooling[0][0]        
__________________________________________________________________________________________________
cam_cls (Dense)                 (None, 2)            2050        flatten_2[0][0]                  
  • Next, you should train a model that can distinguish between images that contain text and those that do not. The dataset is stored in this repository. To reproduce the result, just run 1_train.py.

Running the 1_train.py will do all of this.

2. Plot Class Actication Map (2_cam_plot.py)

  • Now configure the model to output the class activation map. In the previous structure, modify several layers with the following structure. Please refer to the paper for details.
activation_50 (Activation)      (None, 14, 14, 1024) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
binear_up_sampling2d_1 (BinearU (None, 224, 224, 102 0           activation_50[0][0]              
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 50176, 1024)  0           binear_up_sampling2d_1[0][0]     
__________________________________________________________________________________________________
cam_cls (Dense)                 (None, 50176, 2)     2050        reshape_1[0][0]                  
__________________________________________________________________________________________________
reshape_2 (Reshape)             (None, 224, 224, 2)  0           cam_cls[0][0]                    
  • Load the weight we learned earlier on this network. image classifiers and CAM networks have different structures. However, since the layer naming is set in advance, the weight file can be used in both networks in common.

  • Entering image on this network will generate a text activation map.

Running the 2_cam_plot.py will do all of this.

Results

References

About

I implemented a detection algorithm with a classification data set that does not have annotation information for the bounding box. Based on resnet50 network, I implemented text detector using class activation mapping method.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%