I have used inceptionv3 as my backbone and trained it on 20 epochs, got mean iou score of 0.90. However, when i try to predict on test data, the results are like this:

My dataset contains images about tomato disease(10 classes). The Gt mask tells which part on leaf is healthy and which part is diseased, whereas Pr mask is just making the shape of the leaf.
Thanks