You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys, thanks for your work and sharing the code. I have a question about the labels input to calculate the loss. So I understand it as if we have multi-class detection problem, say 5 categories, then the foreground would be 0,1,2,3,4 and the background will be 5. So similarly, if we only have 1 class, then foreground would be 0, background would be 1.
I was just wondering whether this "fg-0 bg-1" has been flipped (as in "fg-1 bg-0") in calculating the loss? Cuz I saw from the vadacore.ops.sigmoid_focal_loss, specifically in sigmoid_focal_loss_cuda.cu file, it wrote
__global__ void SigmoidFocalLossForward(const int nthreads,
const scalar_t *logits,
const int64_t *targets,
const int num_classes,
const float gamma, const float alpha,
const int num, scalar_t *losses) {
CUDA_1D_KERNEL_LOOP(i, nthreads) {
int n = i / num_classes;
int d = i % num_classes; // current class[0~79];
int t = targets[n]; // target class [0~79];
// Decide it is positive or negative case.
scalar_t c1 = (t == d);
scalar_t c2 = (t >= 0 & t != d);
And I guess this int d = i % num_classes; // current class[0~79] is where the labels are flipped (so labels become bg-0 fg-1)?
The reason why I have this question is when I look at the loss, if the labels aren't flipped, it doesn't make sense. For the simplest case, Binary Cross Entropy loss, it should be
loss = - [y log(p) + (1-y) log(1-p)]
Minimizing the loss is equivalent to maximizing y log(p) + (1-y) log(1-p). So here, when y=1, we maximize p; when y=0, we maximize 1-p i.e. minimize p. And so here, if the input labels are in "bg-1 fg-0", we should make it "bg-0 fg-1". Is this correct?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi guys, thanks for your work and sharing the code. I have a question about the labels input to calculate the loss. So I understand it as if we have multi-class detection problem, say 5 categories, then the foreground would be 0,1,2,3,4 and the background will be 5. So similarly, if we only have 1 class, then foreground would be 0, background would be 1.
I was just wondering whether this "fg-0 bg-1" has been flipped (as in "fg-1 bg-0") in calculating the loss? Cuz I saw from the
vadacore.ops.sigmoid_focal_loss
, specifically insigmoid_focal_loss_cuda.cu
file, it wroteAnd I guess this
int d = i % num_classes; // current class[0~79]
is where the labels are flipped (so labels become bg-0 fg-1)?The reason why I have this question is when I look at the loss, if the labels aren't flipped, it doesn't make sense. For the simplest case, Binary Cross Entropy loss, it should be
loss = - [y log(p) + (1-y) log(1-p)]
Minimizing the loss is equivalent to maximizing
y log(p) + (1-y) log(1-p)
. So here, wheny=1
, we maximizep
; wheny=0
, we maximize1-p
i.e. minimizep
. And so here, if the input labels are in "bg-1 fg-0", we should make it "bg-0 fg-1". Is this correct?Thanks!
The text was updated successfully, but these errors were encountered: