Reproducing IoU-Net #36

AlexanderHustinx · 2019-09-16T13:22:06Z

Hey there,

I find the concepts you describe in your paper very interesting. A few months back I started an attempt to reproduce it for my thesis. So far this has proved to be quite challenging.

In prior email contact with one of the authors I mentioned using Faster R-CNN as the baseline and backend of the model. We agreed that this would likely be possible.
Using Faster R-CNN, I managed to recreate Figure 2a. for COCO and other datasets.
But even after more than a hundred experiments and a lot of models with different hyperparameters, I haven't gotten the results you show in Figure 2b.

An example of Fig2a.-b. for COCO:
and for PASCAL:

The figure on the right is supposed to be similar to Figure 2b. from your paper.
But seeing as the inferred localization confidence (mentioned in the figure as 'loc') is predominantly in a short range (not 0.5-1.0) I believe I might have made a mistake with e.g. the jittering of the RoIs.

What I have done to jitter the RoI is:

Take the ground truth bbox {x_1, y_1, w, h}
Take 4 random numbers {a,b,c,d} each in range [0.85, 1.15]
Jitter the bounding box invariant to its absolute position in the image:
x_1' = w * a + x_1
y_1' = h * b + y_1
x_2' = w * c + w + x_1'
y_2' = h * d + h + y_1'

This results in somewhat of a normal distribution:

So, I attempt to sample from this distribution in a way to make it semi-uniform (sampling from bins in the distribution). This results in roughly the following distribution, and thus input for my model during training:
In practice I do not sample values with an IoU lower than 0.5, the figure is merely to describe the sampling method in general.

Finally, I use this sampling method to draw ~128 jittered bboxes per batch from the ~2000 samples I create per batch. These samples are the actual jittered RoIs I input into my model during training.
Their target IoU, or localization confidence, is the IoU between the jittered RoI and the ground truth bbox the jittered RoI is based on.

Do you feel my approach is correct?
Could you please elaborate on how you jitter the RoIs?
How do you sample from the jittered RoIs? How many do you sample per image?
What IoU threshold do you use during the IoU-guided NMS? (I can't find the value for Omega_nms in the paper)

Thanks in advance!

jbr97 · 2019-09-17T03:30:47Z

Hi, alex, I'm Borui. I'm here to share some tricks...

Firstly, the method we use to jitter the boxes.

To generate a sample(batch) during training, we take the following steps:

Equally divide IoU=[0.5, 1) into many intervals, e.g. 50 intervals: [0.5, 0.51), [0.51, 0.52) .... [0.99, 1)
Randomly take a groud-truch bounding box G=(x0, y0, x1, y1) from training data
Randomly take an interval (denoted as [L, R]) according to 1.
Jittering (x0, y0, x1, y1) ——and we get a new box G' =(x0', y0', x1', y1')——until the IoU of G' and G is in [L, R], and then add it into the training sample
Back to 2.

So in 4, how to effectively jitter G=(x0, y0, x1, y1) to fit the given IoU interval [L, R]?
Considering the upper-bound range of change: assume that we could only change x0 to x0+d, and y0, x1, y1 is invariant. The range of d could be indicated as:

① if d > 0, then L<wh/(w+d)h<R, w=x1-x0, h=y1-y0. => w(1-R)/R < d < w(1-L)/L
② if d < 0, for the same reason, w(L-1) < d < w(R-1)
for that we could get the jittering range of x0(w(L-1) < d < w(1-L)/L), and we could also get the range of y0, x1, y1.
And it's clear that we could possibly generate any box in the given IoU interval [L, R].

The boxes generated by our algorithm is more closer to the uniform distribution because we could divide the large interval of IoU to many small intervals.
And we calculate the upper-bound range in order to guarantee the speed of the algorithm.

jbr97 · 2019-09-17T04:57:50Z

I write a code of the method. Here it is:

import random
import time
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import tqdm

def IoU(a, g):
    x0 = max(a[0], g[0])
    x1 = min(a[2], g[2])
    y0 = max(a[1], g[1])
    y1 = min(a[3], g[3])

    w = x1 - x0
    h = y1 - y0

    union = ((a[2]-a[0]) * (a[3]-a[1])
          + (g[2]-g[0]) * (g[3]-g[1]))

    if w < 0 or h < 0:
        return 0
    else:
        return w * h / (union - w * h)

def RandomGT():
    x0 = random.uniform(0, 800)
    y0 = random.uniform(0, 1200)
    x1 = random.uniform(x0, 800)
    y1 = random.uniform(y0, 1200)
    return (x0, y0, x1, y1)

def Jitter(x, L, w, image_l, image_r):
    r = w * (1-L) / L
    l = w * (L-1)

    rec = x + random.uniform(l, r)
    
    rec = max(rec, image_l)
    rec = min(rec, image_r)

    return rec

def Get_candidate(G, L, R):
    w = G[2] - G[0]
    h = G[3] - G[1]

    x0 = Jitter(G[0], L, w, 0, 800)
    x1 = Jitter(G[2], L, w, 0, 800)
    y0 = Jitter(G[1], L, h, 0, 1200)
    y1 = Jitter(G[3], L, h, 0, 1200)

    return (x0, y0, x1, y1)

# main:
start = time.time()
samples = []
for i in tqdm.tqdm(range(100000)):
    G = RandomGT()
    # IoU range: [.5~1)
    # number of intervals: 50
    L = random.randint(50, 99) / 100.
    R = L + .01

    while True:
        B = Get_candidate(G, L, R)
        iou = IoU(B, G)
        if iou >= L and iou <= R:
            break

    samples.append((B, G, IoU(B, G)))

print("Total time: ", time.time() - start)

IoUs = np.array([s[2] for s in samples])
plt.hist(IoUs, bins=50, normed=0, edgecolor='black', alpha=0.7)
plt.show()

And the total time of generating 100k jittered samples:
Total time: 43.842833518981934

And the IoU distribution of these samples:

jbr97 · 2019-09-17T05:31:29Z

3. How do you sample from the jittered RoIs? How many do you sample per image?

I'm sorry that we forgot the exactly number of jittered RoIs during training.
I guess the number is equal to the number of maximum RoIs which are using in the original R-CNN bounding box branch.
I think this parameter is not very important if the number is not too small during training.

4. What IoU threshold do you use during the IoU-guided NMS? (I can't find the value for Omega_nms in the paper)

I'm sorry we forgot Omega_nms from carelessness. The Omega_nms(=0.5) we use is the same as traditional NMS and Soft-NMS.

And extra words:
In Sec3.1 we use Omega_train=0.5 during training IoU-net. We find that using 0.3 is better than 0.5. Because if we use threshold=0.5 to train IoU-net, the lack of discriminating boxes with lower IoU will cause disaster. For example IoU-net predicts very high scores when the boxes with IoU=0.3, because IoU-net has never seen a box with IoU=0.3 during training.
So before we leave Megvii inc.(Face++), we tried to adjust the threshold from 0.5 to 0.3 during training IoU-net, and it works.

AlexanderHustinx · 2019-09-17T12:59:26Z

Thank you very much for your fast and detailed reply!
I'm going to implement these changes and get back to you with results.

For now I'll close the issue.
Thanks again!

AlexanderHustinx · 2019-10-15T07:59:08Z

Hey there,

Sorry for the delay.
I worked out all the math for the other coordinates and ran multiple different training sessions and experiments.

As a result I once in a while get a fig2b similar to the one you present in your paper:

But usually I still get a bad spread, similar to what I had before:

These were both results from using Omega_train = 0.5. I've also experimented with Omega_train = 0.3, as you had mentioned in one of your tips. But for the scope of reproducing your paper, I'd like to first focus on Omega_train = 0.5.

Additional to the figures not being very similar, I have noticed that, when looking at IoU-guided NMS: A higher correlation between localization confidence and IoU doesn't necessarily result in a higher performance. Even when used in combination with the same Faster R-CNN model.
More often than not the result (mAP) of IoU-guided NMS is almost equal to the result (mAP) of Greedy NMS; sometimes slightly worse, sometimes slightly better. Slightly here is ~0.03% mAP.

Do you have any other suggestions that might fix my issue?

jbr97 · 2019-10-15T09:03:12Z

Your model is more likely to produce the higher IoU numbers. I think there may be some bugs during the training of IoU regression.
In Fig3, we discuss the upper bound of NMS algorithm. And even we do not mention it in the paper, if we apply the real IoU numbers (with ground-truth boxes, and not using the predict IoU numbers) to the detect-boxes, the IoU-guided NMS algorithm will work well. If I have time this week, I could write some codes to re-prove it.

AlexanderHustinx · 2019-10-15T12:53:26Z

Intuitively the algorithm makes sense; using the actual IoU instead of the predicted localization confidence should return good results, I agree.
I feel like the problem is mainly the higher IoU numbers that are being predicted in my case. I'll have another look if I can find a problem somewhere.

In my previous post I meant to also point out that so far my results don't show that the actual correlation between loc.conf. and IoU heavily impacts the resulting mAP after IoU-guided NMS.
As an example to elaborate what I mean:
I present three figures, each containing the graphs from Fig2a and Fig2b, their Pearson correlations, and the mAP. Note that these were all examples where the graph with loc.conf. vs. IoU seem somewhat similar to your Fig2b.

A:
Greedy: 72.65% mAP
IoU-guided: 73.31% mAP

B:
Greedy: 71,83% mAP
IoU-guided: 72.2% mAP

C:
Greedy: 72.46% mAP
IoU-guided: 72.69% mAP

Where we see that when comparing case B and C to A, the correlation between loc.conf. and IoU goes up, while the performance actually decreases.
Yet there is a decent difference in performance between Greedy NMS and IoU-guided NMS.

Additionally, look at more two examples where the graphs don't seem as similar, but there is also a high correlation between loc.conf. and IoU (similar to case A).

D:
Greedy: 72.12% mAP
IoU-guided: 71.51% mAP

E:
Greedy: 72.49% mAP
IoU-guided: 72.44% mAP

In these example cases the correlation between loc.conf. and IoU is similar to that of case A, yet the IoU-guided NMS is outperformed by Greedy NMS. in case D more extremely than in E.

This seems odd to me.
Partly because the paper describes that because of the higher correlation between loc.conf. and IoU, when compared to the correlation between class.conf. and IoU, using loc.conf. would make more sense.
This makes me feel like when there is an increase in correlation between loc.conf. and IoU, the performance should increase further.

Or am I maybe confusing a few things?

jbr97 · 2019-10-15T14:10:39Z

I think your experiments don't have any problem. And I think your problem could be indicated as this format:

We have 4 set: C(cls), L(loc 1), L'(loc 2), I(IoU).
Pearson(L, I) < Pearson(L', I) → mAP(IoU-NMS(C, L), I) < mAP(IoU-NMS(C, L'), I)

The formula does not hold. Exceptions could be easily founded. The relation between Pearson and mAP is not that close. Obviously, the boxes with higher IoU is very very important in the computational procedure of mAP. It happened that the boxes with higher IoU is more likely to be predicted with a higher classification confidence or localization confidence intuitively, due to these features are more informative. We have noticed that the well-trained IoUnet predicts more accurate localization confidence when using boxes with higher IoU(about 0.9~1.0) (see in Fig2.b).

AlexanderHustinx · 2019-10-29T13:11:09Z

Sadly I still haven't been able to reproduce the results.
For now I'll leave it as is, so we can close the issue soon.

I do have a few other questions that would help me a lot, if you have time:

To recreate Figure 3, do you use the amount of boxes that are labeled as true positive? So following the mAP calculation.
Or do you use the same bboxes that were matched with the ground truth as in Fig2?
Have you tried this for datasets with fewer classes? i.e. Face detection
For the results shown in Table 1, did you use linear Soft-NMS, or exponential Soft-NMS?

Thanks in advance!

tonysy · 2020-04-26T07:47:30Z

@joishbader Hi, thanks for your work, do you have plan to release the code for IoU-Net.

momo666666 · 2020-09-26T12:59:21Z

I think your experiments don't have any problem. And I think your problem could be indicated as this format:

We have 4 set: C(cls), L(loc 1), L'(loc 2), I(IoU).
Pearson(L, I) < Pearson(L', I) → mAP(IoU-NMS(C, L), I) < mAP(IoU-NMS(C, L'), I)

The formula does not hold. Exceptions could be easily founded. The relation between Pearson and mAP is not that close. Obviously, the boxes with higher IoU is very very important in the computational procedure of mAP. It happened that the boxes with higher IoU is more likely to be predicted with a higher classification confidence or localization confidence intuitively, due to these features are more informative. We have noticed that the well-trained IoUnet predicts more accurate localization confidence when using boxes with higher IoU(about 0.9~1.0) (see in Fig2.b).

Hi,the final error between the predicted_iou and the gt_iou is not given in the paper. Can you tell me the result?

AlexanderHustinx closed this as completed Sep 17, 2019

jbr97 mentioned this issue Sep 17, 2019

IoU threshold in IoU Net #23

Open

vacancy added the good first issue Good for newcomers label Sep 17, 2019

AlexanderHustinx reopened this Oct 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing IoU-Net #36

Reproducing IoU-Net #36

AlexanderHustinx commented Sep 16, 2019 •

edited

Loading

jbr97 commented Sep 17, 2019 •

edited

Loading

jbr97 commented Sep 17, 2019

jbr97 commented Sep 17, 2019

AlexanderHustinx commented Sep 17, 2019

AlexanderHustinx commented Oct 15, 2019 •

edited

Loading

jbr97 commented Oct 15, 2019

AlexanderHustinx commented Oct 15, 2019 •

edited

Loading

jbr97 commented Oct 15, 2019 •

edited

Loading

AlexanderHustinx commented Oct 29, 2019 •

edited

Loading

tonysy commented Apr 26, 2020

momo666666 commented Sep 26, 2020

Reproducing IoU-Net #36

Reproducing IoU-Net #36

Comments

AlexanderHustinx commented Sep 16, 2019 • edited Loading

jbr97 commented Sep 17, 2019 • edited Loading

jbr97 commented Sep 17, 2019

jbr97 commented Sep 17, 2019

AlexanderHustinx commented Sep 17, 2019

AlexanderHustinx commented Oct 15, 2019 • edited Loading

jbr97 commented Oct 15, 2019

AlexanderHustinx commented Oct 15, 2019 • edited Loading

jbr97 commented Oct 15, 2019 • edited Loading

AlexanderHustinx commented Oct 29, 2019 • edited Loading

tonysy commented Apr 26, 2020

momo666666 commented Sep 26, 2020

AlexanderHustinx commented Sep 16, 2019 •

edited

Loading

jbr97 commented Sep 17, 2019 •

edited

Loading

AlexanderHustinx commented Oct 15, 2019 •

edited

Loading

AlexanderHustinx commented Oct 15, 2019 •

edited

Loading

jbr97 commented Oct 15, 2019 •

edited

Loading

AlexanderHustinx commented Oct 29, 2019 •

edited

Loading