Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About configurations #8

Open
hellbell opened this issue Jul 25, 2019 · 16 comments
Open

About configurations #8

hellbell opened this issue Jul 25, 2019 · 16 comments

Comments

@hellbell
Copy link

First, thank you for your kind paper and github page.
Your work is super useful for studying text detection using mask-rcnn baseline.
I am reproducing the results of PMTD but my results are little bit worse. (Mask RCNN baseline 60% F-measure on MLT dataset)
So I'm figuring out what is wrong with my configuration.
It will be very helpful if the config file (.yaml) is provided, or let me know RPN.ANCHOR_STRIDE setting (currently, I'm using (4, 8, 16, 32, 64))
Thanks!

@kapness
Copy link

kapness commented Aug 5, 2019

I think you may meet the same question as I met before.You can have a look at my issue.The author gives some useful advice.

@hellbell
Copy link
Author

hellbell commented Aug 6, 2019

@kapness Thank you for the kind reply!
I followed your issue but the results were still worse than my expectation.
It would be very helpful if you share your config file (.yaml) :)
Thank you again.

@kapness
Copy link

kapness commented Aug 6, 2019 via email

@kapness
Copy link

kapness commented Aug 6, 2019 via email

@hellbell
Copy link
Author

hellbell commented Aug 6, 2019

@kapness
Thank you for your advice. I will try it right now!

@JingChaoLiu
Copy link
Collaborator

@kapness Thanks a lot!

@kapness
Copy link

kapness commented Aug 7, 2019

@hellbell And the _C.MODEL.RPN.ASPECT_RATIOS in defaults.py should be modified as the paper said. I forgot this tip before.

@hellbell
Copy link
Author

@kapness @JingChaoLiu
Thank you for your kind replies.
I trained vanilla Mask-RCNN on ICDAR2017-MLT and got F-score only 62% which is still far under the baseline.
My settings:

  • based on e2e_mask_rcnn_R_50_FPN_1x.yaml
  • changed MODEL.RPN.ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)
  • changed MODEL.RPN.FPN_POST_NMS_PER_BATCH = False
  • 4 gpus with these learning rates
SOLVER:
  BASE_LR: 0.01
  WEIGHT_DECAY: 0.0001
  STEPS: (50000, 80000)
  MAX_ITER: 100000
  IMS_PER_BATCH: 16

My questions are:

  • At the test time, the confidence score threshold for selecting valid bounding box is set to 0.5. Is it okay?
  • I guess my data augmentation of trasnform.py might be wrong. Would you share your transform.py file or give me some tips? I posted my code snippets.
class RandomSampleCrop(object):
    def __init__(self, crop_size=640, min_size=640, max_size=2560):
        self.crop_size = crop_size
        self.min_size = min_size
        self.max_size = max_size


    def get_size(self):
        # w, h = image_size
        w_resize = random.randint(self.min_size, self.max_size)
        h_resize = random.randint(self.min_size, self.max_size)
        return (h_resize, w_resize)

    def __call__(self, image, target):

        while (True):
            resized_size = self.get_size()
            image_r = F.resize(image, resized_size)
            target_r = target.resize(image_r.size)

            width, height = image_r.size
            crop_left = random.randint(0,width-self.crop_size)
            crop_top = random.randint(0,height-self.crop_size)
            target_r_c = target_r.crop([crop_left, crop_top, crop_left+self.crop_size, crop_top+self.crop_size])
            target_r_c = target_r_c.clip_to_image()
            if len(target_r_c) > 0:

                too_small = False
                for t in target_r_c.bbox:
                    w, h = t[2] - t[0], t[3] - t[1]
                    if w < 1 or h < 1:
                        too_small = True
                if too_small:
                    continue
                break
        target_r_c = target_r_c
        image_r_c = image_r.crop([crop_left, crop_top, crop_left + self.crop_size, crop_top + self.crop_size])

Many thanks!

@kapness
Copy link

kapness commented Aug 15, 2019 via email

@hellbell
Copy link
Author

@kapness
I checked the crop function with some visualization. It seems ok..

@JingChaoLiu
Copy link
Collaborator

@kapness thanks again for your reply.
@hellbell

  1. Following the previous answers and the paper, here is one configuration which I just wrote. Sorry for no time to validate it and no guarantee to the F-measure.
MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
  BACKBONE:
    CONV_BODY: "R-50-FPN"
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
  RPN:
    USE_FPN: True
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    ANCHOR_SIZES: (16, 32, 64, 128, 256)
    ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)
    STRADDLE_THRESH: 10 # Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels,
      # I changed this value from 0 to 10 in the early stage accidentally and forgot to change back. But I think this change makes no difference.
    PRE_NMS_TOP_N_TRAIN: 2000
    PRE_NMS_TOP_N_TEST: 1000
    POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_PER_BATCH: False
  ROI_HEADS:
    USE_FPN: True
  ROI_BOX_HEAD:
    NUM_CLASSES: 2
    POOLER_RESOLUTION: 7
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
  ROI_MASK_HEAD:
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
    PREDICTOR: "MaskRCNNC4Predictor"
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 2
    RESOLUTION: 28
    SHARE_BOX_FEATURE_EXTRACTOR: False
  MASK_ON: True
DATASETS:
  TRAIN: ("icadar_2017_mlt_train", "icdar_2017_mlt_val")
  TEST: ("icdar_2017_mlt_test",)
DATALOADER:
  SIZE_DIVISIBILITY: 32
SOLVER:
  WARMUP_METHOD: 'linear' # PMTD use 'exponential' which is not implemented in maskrcnn-benchmark
  WARMUP_ITERS: 4500 # warmup_iter = (image_num=9000 * warmup_epoch=8 / batch_size=16)
  IMS_PER_BATCH: 16
  BASE_LR: 0.02 # PMTD use batch_size * 0.00125 with syncBN
  WEIGHT_DECAY: 0.0001
  STEPS: (49500, 76500) # warmup_iter + (iter * 0.5, iter * 0.8)
  MAX_ITER: 94500 # iter = (image_num=9000 * warmup_epoch=160 / batch_size=16) = 90000, max_iter = (warmup_iter + iter)
  1. Have you done a grid search for the parameters (cls_threshold, nms_threshold) of final NMS? See Question about score threshold of Bbox Branch #4 for more details. This can make a bigger difference than some neglectable training details.

  2. See Question about crop step in Data augmentation #5 to see the problematic crop operation. There are two problems. One, the point number of the cropped mask may float from 3 to 8, no longer a constant number 4. Two, the difference between the cropped origin bounding box and the correct cropped bounding box.

@kapness
Copy link

kapness commented Aug 15, 2019 via email

@JingChaoLiu
Copy link
Collaborator

OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.

  1. image -> backbone
  2. -> RPN
    >> pred_cls, pred_reg = RPN.forward(All proposals)
    >> random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 * image_num proposals to calculate loss. (sample_num is far less than len(All proposals))
    >> postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN * image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False
  3. RPN -> bbox branch
    >> pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN)
    >> random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals to calculate loss
    >> (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.
  4. RPN -> mask branch
    >> pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN)
    >> calculate mask loss for all predicted mask
  5. backward the loss to update parameters

my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?

batch_size = 16 is enough.

@kapness
Copy link

kapness commented Aug 15, 2019 via email

@kapness
Copy link

kapness commented Aug 16, 2019 via email

@JingChaoLiu
Copy link
Collaborator

yes, for the negative proposals, just set the reg loss to 0 before sorting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants