Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have implement your pmtd ideas, but the result is not good enough. #2

Open
tsing-cv opened this issue Jun 2, 2019 · 16 comments
Open

Comments

@tsing-cv
Copy link

tsing-cv commented Jun 2, 2019

No description provided.

@JingChaoLiu
Copy link
Collaborator

Could you provide your implementation details of PMTD and the evaluation score of ICDAR 2017?I'm afraid that I can't help you without these details.
You may need to pay attentions to these common details:

Train Stage

Train Scheduler

We train PMTD use 32 (noted as gpu_num) TITIANX 12G GPUs with SyncBatchNorm.

batch_size = gpu_num * 2 = 64
base_learning_rate = 0.00125 * batch_size = 0.08
optimizer = SGD + Multi-Step

The learning rate changes as follows:

  • [-8 epoch, 0 epoch) : Warmup: 8 epoch, learning rate increases exponentially from 0.00125 to 0.08
  • [0 epoch, 80 epoch) : Trained with a fixed learning_rate 0.08
  • [80 epoch, 128 epoch): Trained with a fixed learning_rate 0.008
  • [128 epoch, 160 epoch): Trained with a fixed learning_rate 0.0008

Loss Design

Loss = RPN Loss(cls + reg) + Bounding Box Loss(cls + reg) + Mask Loss

note: cls=classification, reg=regression

RPN Loss and Bounding Box Loss

RPN Loss and Bounding Box Loss are same with Mask RCNN, only changing the class_num from 81 to 2

Mask Loss:

mask_loss = 5 * l1_loss(predict_tensor, target_tensor)
predict_tensor = Tensor[B=positive_bbox_num, C={bg, fg}, H=28, W=28]

note: bg=background, fg=foreground
target_tensor's shape is same with the predict_tensor by setting bg channel to full zero and fg channel to pyramid label

Data augmentation, RPN Anchor and OHEM

these details should implement as the paper described. Remember to random resize the image without keeping the aspect ratio

Image shape: (H, W) -> 640 * (1+3*random_1, 1+3*random_2)

Pyramid Label generation:

  1. Draw the pyramid label on the cropped image to check its correctness
  2. After cropped, the text region is still a polygon, but not necessarily a quadrilateral. The point num of this polygon may varies from 3 to 8, not the constant 4.
  3. When loaded the provided pretrained model, the mask_loss between predict_tensor and target_tensor should be around 0.08

Test Stage

just follow the release code

@tsing-cv
Copy link
Author

tsing-cv commented Jun 5, 2019

thank you for your response, I have few ops difference with your proposal.

  1. mask loss: binary_cross_entropy
  2. after obtain bbox with mask, using rbox regression to get rbox.
    I am using tf2.0 reimplement my program, leaved much bug to address.

@jylins
Copy link

jylins commented Jun 25, 2019

Hi @JingChaoLiu , I have another question.
Do you use ignore data(###) in ICDAR 2017 on training?

x1,y1,x2,y2,x3,y3,x4,y4,###

@JingChaoLiu
Copy link
Collaborator

Yes, we use them. The boxes with ignore=True in ICDAR 2017 are similar to those with is_crowd=True in COCO. So we follow the settings in the Mask R-CNN:
Noting intersection_area = predict_box ∩ groundtruth_ignore_box, if intersection_area / groundtruth_ignore_box > 0.5 then the predict_box is set to ignore,namly neither positive nor negative.

@GarrettLee
Copy link

Hi @tsing-cv , could you possibly share your implementation. I would appreciate it

@JingChaoLiu JingChaoLiu mentioned this issue Aug 14, 2019
@tsing-cv
Copy link
Author

@GarrettLee My work fell short of what the paper claimed

@GarrettLee
Copy link

3. When loaded the provided pretrained model, the mask_loss between `predict_tensor` and `target_tensor` should be around 0.08

Hi @JingChaoLiu , do you mean that l1_loss(predict_tensor, target_tensor) is around 0.08 or 5 *l1_loss(predict_tensor, target_tensor) is around 0.08. Thanks in advance.

@JingChaoLiu
Copy link
Collaborator

5 * l1_loss(predict_tensor, target_tensor)

@soldierofhell
Copy link

Mask Loss:

mask_loss = 5 * l1_loss(predict_tensor, target_tensor)
predict_tensor = Tensor[B=positive_bbox_num, C={bg, fg}, H=28, W=28]

note: bg=background, fg=foreground
target_tensor's shape is same with the predict_tensor by setting bg channel to full zero and fg channel to pyramid label

Hi @JingChaoLiu ,
I'm trying to train with L1 loss, but inputs quickly became zeros and mask head stops to learn.
I implemented your suggestion to zero input for backgroud but it didn't help:

input = mask_logits[positive_inds, labels_pos]
target = mask_targets
input = torch.where(target>0, input, torch.zeros_like(input))
l1_loss = torch.nn.L1Loss()
mask_loss = 5*l1_loss(input, target)

As for the targets everything is ok (inspected visually).
I'm basicly trainging plain maskrcnn-benchmark + pyramid labels + L1loss.
Pure maskrcnn-benchmark works fine and I'm just trying to improve it.

Is there anything I should tweak to make this loss converge? Thanks in advance for any suggestions

Regards,

@soldierofhell
Copy link

I found I had also PMTD mask predictor "MaskRCNNC4Predictor_Upsample". When I switched back to "MaskRCNNC4Predictor" mask loss seems alright now :)
Any idea why bilinear upsampling causes trouble?

@JingChaoLiu
Copy link
Collaborator

I'm basicly trainging plain maskrcnn-benchmark + pyramid labels + L1loss.

In plain maskrcnn-benchmark, the mask loss is calculated by 'binary_cross_entropy', which is implemented by 'mask.sigmoid() + dot product'. When trained with pyramid label, append mask = mask.sigmoid() explicitly after the mask prediction. Then calculate the L1 Loss between the sigmoid_mask and pyramid label.

@JingChaoLiu
Copy link
Collaborator

Maybe we append a sigmoid() function inside the MaskRCNNC4Predictor_Upsample.

@soldierofhell
Copy link

@JingChaoLiu, thank you for pointing this out to me. Indeed there's sigmoid() in your MaskRCNNC4Predictor_Upsample head and at this moment this is the cause of bad convergence (I mean if I add it to MaskRCNNC4Predictor head I get the same effect). On the other hand, if I train with no sigmoid (which in theory is nothing wrong?) I get really nice mask pyramids.. but.. the spread between predictions is very low, I mean all valueas are really close to 0.5 and the pyramid "generator" doesn't work even if change some "constants" inside (probably some kind of rescaling would help, but I don't want to introduce more "tweaking" and I suspect the real problem is in training).
Well, I guess I will struggle with it for some more time

@soldierofhell
Copy link

My plan for now is to:

  • inspect gradients in sigmoid() version
  • train longer non-sigmoid() version
  • freeze batch norm (?)
  • try diffrent loss like mse

Any other suggestions? :)

@kalupiu
Copy link

kalupiu commented Feb 26, 2020

My plan for now is to:

  • inspect gradients in sigmoid() version
  • train longer non-sigmoid() version
  • freeze batch norm (?)
  • try diffrent loss like mse

Any other suggestions? :)

Same issues. Any solutions?

@congjianting
Copy link

请问有可用的训练代码参考吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants