Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing IoU-Net #36

Open
AlexanderHustinx opened this issue Sep 16, 2019 · 11 comments
Open

Reproducing IoU-Net #36

AlexanderHustinx opened this issue Sep 16, 2019 · 11 comments
Labels
good first issue Good for newcomers

Comments

@AlexanderHustinx
Copy link

AlexanderHustinx commented Sep 16, 2019

Hey there,

I find the concepts you describe in your paper very interesting. A few months back I started an attempt to reproduce it for my thesis. So far this has proved to be quite challenging.

In prior email contact with one of the authors I mentioned using Faster R-CNN as the baseline and backend of the model. We agreed that this would likely be possible.
Using Faster R-CNN, I managed to recreate Figure 2a. for COCO and other datasets.
But even after more than a hundred experiments and a lot of models with different hyperparameters, I haven't gotten the results you show in Figure 2b.

An example of Fig2a.-b. for COCO: fig2a-b
and for PASCAL: fig2a_b_pascal

The figure on the right is supposed to be similar to Figure 2b. from your paper.
But seeing as the inferred localization confidence (mentioned in the figure as 'loc') is predominantly in a short range (not 0.5-1.0) I believe I might have made a mistake with e.g. the jittering of the RoIs.

What I have done to jitter the RoI is:

  1. Take the ground truth bbox {x_1, y_1, w, h}
  2. Take 4 random numbers {a,b,c,d} each in range [0.85, 1.15]
  3. Jitter the bounding box invariant to its absolute position in the image:
    x_1' = w * a + x_1
    y_1' = h * b + y_1
    x_2' = w * c + w + x_1'
    y_2' = h * d + h + y_1'

This results in somewhat of a normal distribution: distribution_jitter

So, I attempt to sample from this distribution in a way to make it semi-uniform (sampling from bins in the distribution). This results in roughly the following distribution, and thus input for my model during training: sampled_distribution_jitter
In practice I do not sample values with an IoU lower than 0.5, the figure is merely to describe the sampling method in general.

Finally, I use this sampling method to draw ~128 jittered bboxes per batch from the ~2000 samples I create per batch. These samples are the actual jittered RoIs I input into my model during training.
Their target IoU, or localization confidence, is the IoU between the jittered RoI and the ground truth bbox the jittered RoI is based on.

  1. Do you feel my approach is correct?
  2. Could you please elaborate on how you jitter the RoIs?
  3. How do you sample from the jittered RoIs? How many do you sample per image?
  4. What IoU threshold do you use during the IoU-guided NMS? (I can't find the value for Omega_nms in the paper)

Thanks in advance!

@jbr97
Copy link

jbr97 commented Sep 17, 2019

Hi, alex, I'm Borui. I'm here to share some tricks...

Firstly, the method we use to jitter the boxes.

To generate a sample(batch) during training, we take the following steps:

  1. Equally divide IoU=[0.5, 1) into many intervals, e.g. 50 intervals: [0.5, 0.51), [0.51, 0.52) .... [0.99, 1)
  2. Randomly take a groud-truch bounding box G=(x0, y0, x1, y1) from training data
  3. Randomly take an interval (denoted as [L, R]) according to 1.
  4. Jittering (x0, y0, x1, y1) ——and we get a new box G' =(x0', y0', x1', y1')——until the IoU of G' and G is in [L, R], and then add it into the training sample
  5. Back to 2.

So in 4, how to effectively jitter G=(x0, y0, x1, y1) to fit the given IoU interval [L, R]?
Considering the upper-bound range of change: assume that we could only change x0 to x0+d, and y0, x1, y1 is invariant. The range of d could be indicated as:

① if d > 0, then L<wh/(w+d)h<R, w=x1-x0, h=y1-y0. => w(1-R)/R < d < w(1-L)/L
② if d < 0, for the same reason, w(L-1) < d < w(R-1)
for that we could get the jittering range of x0(w(L-1) < d < w(1-L)/L), and we could also get the range of y0, x1, y1.
And it's clear that we could possibly generate any box in the given IoU interval [L, R].

The boxes generated by our algorithm is more closer to the uniform distribution because we could divide the large interval of IoU to many small intervals.
And we calculate the upper-bound range in order to guarantee the speed of the algorithm.

@jbr97
Copy link

jbr97 commented Sep 17, 2019

I write a code of the method. Here it is:

import random
import time
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import tqdm

def IoU(a, g):
    x0 = max(a[0], g[0])
    x1 = min(a[2], g[2])
    y0 = max(a[1], g[1])
    y1 = min(a[3], g[3])

    w = x1 - x0
    h = y1 - y0

    union = ((a[2]-a[0]) * (a[3]-a[1])
          + (g[2]-g[0]) * (g[3]-g[1]))

    if w < 0 or h < 0:
        return 0
    else:
        return w * h / (union - w * h)

def RandomGT():
    x0 = random.uniform(0, 800)
    y0 = random.uniform(0, 1200)
    x1 = random.uniform(x0, 800)
    y1 = random.uniform(y0, 1200)
    return (x0, y0, x1, y1)

def Jitter(x, L, w, image_l, image_r):
    r = w * (1-L) / L
    l = w * (L-1)

    rec = x + random.uniform(l, r)
    
    rec = max(rec, image_l)
    rec = min(rec, image_r)

    return rec

def Get_candidate(G, L, R):
    w = G[2] - G[0]
    h = G[3] - G[1]

    x0 = Jitter(G[0], L, w, 0, 800)
    x1 = Jitter(G[2], L, w, 0, 800)
    y0 = Jitter(G[1], L, h, 0, 1200)
    y1 = Jitter(G[3], L, h, 0, 1200)

    return (x0, y0, x1, y1)

# main:
start = time.time()
samples = []
for i in tqdm.tqdm(range(100000)):
    G = RandomGT()
    # IoU range: [.5~1)
    # number of intervals: 50
    L = random.randint(50, 99) / 100.
    R = L + .01

    while True:
        B = Get_candidate(G, L, R)
        iou = IoU(B, G)
        if iou >= L and iou <= R:
            break

    samples.append((B, G, IoU(B, G)))

print("Total time: ", time.time() - start)

IoUs = np.array([s[2] for s in samples])
plt.hist(IoUs, bins=50, normed=0, edgecolor='black', alpha=0.7)
plt.show()

And the total time of generating 100k jittered samples:
Total time: 43.842833518981934

And the IoU distribution of these samples:
image

@jbr97
Copy link

jbr97 commented Sep 17, 2019

3. How do you sample from the jittered RoIs? How many do you sample per image?

I'm sorry that we forgot the exactly number of jittered RoIs during training.
I guess the number is equal to the number of maximum RoIs which are using in the original R-CNN bounding box branch.
I think this parameter is not very important if the number is not too small during training.

4. What IoU threshold do you use during the IoU-guided NMS? (I can't find the value for Omega_nms in the paper)

I'm sorry we forgot Omega_nms from carelessness. The Omega_nms(=0.5) we use is the same as traditional NMS and Soft-NMS.

And extra words:
In Sec3.1 we use Omega_train=0.5 during training IoU-net. We find that using 0.3 is better than 0.5. Because if we use threshold=0.5 to train IoU-net, the lack of discriminating boxes with lower IoU will cause disaster. For example IoU-net predicts very high scores when the boxes with IoU=0.3, because IoU-net has never seen a box with IoU=0.3 during training.
So before we leave Megvii inc.(Face++), we tried to adjust the threshold from 0.5 to 0.3 during training IoU-net, and it works.

@AlexanderHustinx
Copy link
Author

Thank you very much for your fast and detailed reply!
I'm going to implement these changes and get back to you with results.

For now I'll close the issue.
Thanks again!

@AlexanderHustinx
Copy link
Author

AlexanderHustinx commented Oct 15, 2019

Hey there,

Sorry for the delay.
I worked out all the math for the other coordinates and ran multiple different training sessions and experiments.

As a result I once in a while get a fig2b similar to the one you present in your paper:
fig2b_success

But usually I still get a bad spread, similar to what I had before:
fig2b_usual

These were both results from using Omega_train = 0.5. I've also experimented with Omega_train = 0.3, as you had mentioned in one of your tips. But for the scope of reproducing your paper, I'd like to first focus on Omega_train = 0.5.

Additional to the figures not being very similar, I have noticed that, when looking at IoU-guided NMS: A higher correlation between localization confidence and IoU doesn't necessarily result in a higher performance. Even when used in combination with the same Faster R-CNN model.
More often than not the result (mAP) of IoU-guided NMS is almost equal to the result (mAP) of Greedy NMS; sometimes slightly worse, sometimes slightly better. Slightly here is ~0.03% mAP.

Do you have any other suggestions that might fix my issue?

@jbr97
Copy link

jbr97 commented Oct 15, 2019

  1. Your model is more likely to produce the higher IoU numbers. I think there may be some bugs during the training of IoU regression.

  2. In Fig3, we discuss the upper bound of NMS algorithm. And even we do not mention it in the paper, if we apply the real IoU numbers (with ground-truth boxes, and not using the predict IoU numbers) to the detect-boxes, the IoU-guided NMS algorithm will work well. If I have time this week, I could write some codes to re-prove it.

@AlexanderHustinx
Copy link
Author

AlexanderHustinx commented Oct 15, 2019

Intuitively the algorithm makes sense; using the actual IoU instead of the predicted localization confidence should return good results, I agree.
I feel like the problem is mainly the higher IoU numbers that are being predicted in my case. I'll have another look if I can find a problem somewhere.

In my previous post I meant to also point out that so far my results don't show that the actual correlation between loc.conf. and IoU heavily impacts the resulting mAP after IoU-guided NMS.
As an example to elaborate what I mean:
I present three figures, each containing the graphs from Fig2a and Fig2b, their Pearson correlations, and the mAP. Note that these were all examples where the graph with loc.conf. vs. IoU seem somewhat similar to your Fig2b.

A:
Greedy: 72.65% mAP
IoU-guided: 73.31% mAP
image

B:
Greedy: 71,83% mAP
IoU-guided: 72.2% mAP
image

C:
Greedy: 72.46% mAP
IoU-guided: 72.69% mAP
image

Where we see that when comparing case B and C to A, the correlation between loc.conf. and IoU goes up, while the performance actually decreases.
Yet there is a decent difference in performance between Greedy NMS and IoU-guided NMS.

Additionally, look at more two examples where the graphs don't seem as similar, but there is also a high correlation between loc.conf. and IoU (similar to case A).

D:
Greedy: 72.12% mAP
IoU-guided: 71.51% mAP
image

E:
Greedy: 72.49% mAP
IoU-guided: 72.44% mAP
image

In these example cases the correlation between loc.conf. and IoU is similar to that of case A, yet the IoU-guided NMS is outperformed by Greedy NMS. in case D more extremely than in E.

This seems odd to me.
Partly because the paper describes that because of the higher correlation between loc.conf. and IoU, when compared to the correlation between class.conf. and IoU, using loc.conf. would make more sense.
This makes me feel like when there is an increase in correlation between loc.conf. and IoU, the performance should increase further.

Or am I maybe confusing a few things?

@jbr97
Copy link

jbr97 commented Oct 15, 2019

I think your experiments don't have any problem. And I think your problem could be indicated as this format:

We have 4 set: C(cls), L(loc 1), L'(loc 2), I(IoU).
Pearson(L, I) < Pearson(L', I) → mAP(IoU-NMS(C, L), I) < mAP(IoU-NMS(C, L'), I)

The formula does not hold. Exceptions could be easily founded. The relation between Pearson and mAP is not that close. Obviously, the boxes with higher IoU is very very important in the computational procedure of mAP. It happened that the boxes with higher IoU is more likely to be predicted with a higher classification confidence or localization confidence intuitively, due to these features are more informative. We have noticed that the well-trained IoUnet predicts more accurate localization confidence when using boxes with higher IoU(about 0.9~1.0) (see in Fig2.b).

@AlexanderHustinx
Copy link
Author

AlexanderHustinx commented Oct 29, 2019

Sadly I still haven't been able to reproduce the results.
For now I'll leave it as is, so we can close the issue soon.

I do have a few other questions that would help me a lot, if you have time:

  1. To recreate Figure 3, do you use the amount of boxes that are labeled as true positive? So following the mAP calculation.
    Or do you use the same bboxes that were matched with the ground truth as in Fig2?
  2. Have you tried this for datasets with fewer classes? i.e. Face detection
  3. For the results shown in Table 1, did you use linear Soft-NMS, or exponential Soft-NMS?

Thanks in advance!

@tonysy
Copy link

tonysy commented Apr 26, 2020

@joishbader Hi, thanks for your work, do you have plan to release the code for IoU-Net.

@momo666666
Copy link

I think your experiments don't have any problem. And I think your problem could be indicated as this format:

We have 4 set: C(cls), L(loc 1), L'(loc 2), I(IoU).
Pearson(L, I) < Pearson(L', I) → mAP(IoU-NMS(C, L), I) < mAP(IoU-NMS(C, L'), I)

The formula does not hold. Exceptions could be easily founded. The relation between Pearson and mAP is not that close. Obviously, the boxes with higher IoU is very very important in the computational procedure of mAP. It happened that the boxes with higher IoU is more likely to be predicted with a higher classification confidence or localization confidence intuitively, due to these features are more informative. We have noticed that the well-trained IoUnet predicts more accurate localization confidence when using boxes with higher IoU(about 0.9~1.0) (see in Fig2.b).

Hi,the final error between the predicted_iou and the gt_iou is not given in the paper. Can you tell me the result?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants