|
7 | 7 | <!-- Replace the content tag with appropriate information -->
|
8 | 8 | <meta name="description" content="Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It. Accepted to ICLR 2025.">
|
9 | 9 | <meta property="og:title" content="Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It" />
|
10 |
| - <meta property="og:description" content="Don't forget about majority voting when you evaluate your TTA method :)" /> |
| 10 | + <meta property="og:description" content="A lead to explain the disastrous effects of label-smoothing on the selective classification abilities of your model." /> |
11 | 11 | <meta property="og:url" content="https://ensta-u2is-ai.github.io/Understanding-Label-smoothing-Selective-classification/" />
|
12 | 12 | <!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X630-->
|
13 |
| - <meta property="og:image" content="static/images/teaser.png" /> |
| 13 | + <!-- <meta property="og:image" content="static/images/teaser.png" /> --> |
14 | 14 | <meta property="og:image:width" content="1200" />
|
15 | 15 | <meta property="og:image:height" content="630" />
|
16 | 16 |
|
@@ -211,97 +211,6 @@ <h2 class="title is-3">Abstract</h2>
|
211 | 211 | </section>
|
212 | 212 | <!-- End paper abstract -->
|
213 | 213 |
|
214 |
| - <!-- <section class="hero teaser"> |
215 |
| - <div class="container is-max-desktop"> |
216 |
| - <div class="hero-body has-text-centered"> |
217 |
| - <h2 class="title is-3">Takeaways</h2> |
218 |
| - <div style="display: flex; justify-content: center;"> |
219 |
| - <img src="static/images/teaser.png" alt="Banner Image" height="100%" width="75%" style="margin-bottom: 55px;"> |
220 |
| - </div> --> |
221 |
| - <!-- <h2 class="subtitle has-text-justified"> |
222 |
| - <br><b>Background on Marginal Entropy Minimization.</b> Test-Time Adaptation aims at adapting a model to a single image at inference time. |
223 |
| - Ideally, no form of prior or external knowledge should be employed in doing so. |
224 |
| - An established paradigm for TTA is <b>M</b>arginal <b>E</b>ntropy <b>M</b>inimization, which works by augmenting the image N times, |
225 |
| - computing the so-called "marginal probability distribution" (i.e., the average probability distribution over the views), and |
226 |
| - minimizing the entropy of this distribution.<br><br> |
227 |
| -
|
228 |
| - <b>Findings.</b> We find that the argmax of the marginal distribution is invariant to <b>MEM</b> most of the time (and can be guaranteed to be so under certain conditions), |
229 |
| - and that this marginal distribution itself is reasonably better than standard inference, under the assumption that the model is well-calibrated. |
230 |
| - <br><br>Empirical evidence for these findings is shown below (left: invariance, right: ensemble verification). |
231 |
| - </h2> |
232 |
| - <img src="static/images/I_binned_ent_vs_invariance.png" alt="Banner Image" style="height: 18em; width: auto; padding-inline: 2rem;"> |
233 |
| - <img src="static/images/ensemble_verification_over_datasets.png" alt="Banner Image" style="height: 18em; width: auto; padding-inline: 2rem"> |
234 |
| -
|
235 |
| - <h2 class="subtitle has-text-justified"> |
236 |
| - <b>Problem.</b> Calibration is missing on augmented data, but we largely observe that CLIP models are still pretty accurate in this regime. |
237 |
| - For example, here is what the reliability plots of CLIP-ViT-B-16 look like. |
238 |
| - </h2> |
239 |
| - <img src="static/images/rpI.png" alt="Banner Image" style="height: 20em; width: auto;"> |
240 |
| -
|
241 |
| - <h2 class="subtitle has-text-justified"> |
242 |
| - <b>TTA with "zero" temperature</b> is a direct consequence of these observations: since confidence information is unreliable, |
243 |
| - simply compute the marginal distribution <i>after</i> the temperature has been zeroed-out! By only adapting this parameter, we are effectively marginalizing |
244 |
| - across one-hot encoded vectors... does this remind you of something? |
245 |
| - </h2> |
246 |
| -
|
247 |
| - </div> |
248 |
| - </div> |
249 |
| - </section> --> |
250 |
| - <!-- End paper abstract --> |
251 |
| - |
252 |
| - <!-- Method overview--> |
253 |
| - <!-- <section class="section hero is-light"> |
254 |
| - <div class="container is-max-desktop"> |
255 |
| - <h2 class="title is-3 has-text-centered">Implementation</h2> |
256 |
| - <h2 class="subtitle has-text-centered" style="padding: 0px; margin: 0px">ZERO is implemented in a few lines of code. You can find a PyTorch-like implementation right here :)</h2> |
257 |
| - <pre class="has-text-justified" style="width: 80rem; overflow-x: auto; padding: 0px; margin: 0px"> |
258 |
| - <code class="python" style="padding: 0px; margin: 0px"> |
259 |
| - def zero(image, z_txt, N, gamma, temp): |
260 |
| - """ |
261 |
| - :param z_txt: pre-computed text embeddings (C,hdim) |
262 |
| - :param temp: model’s original temperature |
263 |
| - :param augment: takes (C,H,W) and returns (N,C,H,W) |
264 |
| - :param gamma: filtering percentile (e.g., 0.3) |
265 |
| - """ |
266 |
| - views = augment(image, num_views=N) # generate augmented views |
267 |
| - l = model.image_encoder(views) @ z_txt.t() # predict (unscaled logits) |
268 |
| - l_filt = confidence_filter(l, temp, top=gamma) # retain most confident preds |
269 |
| - zero_temp = torch.finfo(l_filt.dtype).eps # zero temperature |
270 |
| - p_bar = (l_filt / zero_temp).softmax(dim=1).sum(dim=0) # marginalize |
271 |
| - return p_bar.argmax() |
272 |
| - </code> |
273 |
| - </pre> |
274 |
| - </div> |
275 |
| - </section> --> |
276 |
| - <!-- End method overview --> |
277 |
| - |
278 |
| - <!-- Results --> |
279 |
| - <!-- <section class="hero"> |
280 |
| - <div class="container is-max-desktop"> |
281 |
| - <div class="hero-body has-text-centered"> |
282 |
| - <h2 class="title is-3">Results</h2> |
283 |
| - <h2 class="subtitle has-text-justified"> |
284 |
| - <br>We evaluate ZERO on the standard TTA benchmarks, including robustness to Natural Distribution Shifts and Fine-grained Classification. |
285 |
| - The results below report CLIP-ViT-B-16 from OpenAI, and compare ZERO to TPT, PromptAlign and RLCF. |
286 |
| - </h2> |
287 |
| - |
288 |
| - <p><b>Robustness to Natural Distribution Shifts</b></p> |
289 |
| - <img src="static/images/nds.png" alt="Banner Image" style="height: auto; width: 100em;"> |
290 |
| - <br><br> |
291 |
| -
|
292 |
| - <p><b>Fine-grained Classification</b></p> |
293 |
| - <img src="static/images/fg.png" alt="Banner Image" style="height: auto; width: 100em;"> |
294 |
| -
|
295 |
| - <h2 class="subtitle has-text-justified"><br> |
296 |
| - We find that ZERO, in all its simplicity, establishes a new <b>state-of-the-art</b> in TTA! |
297 |
| - Don't forget about majority voting when you evaluate your TTA method!! :) |
298 |
| - </h2> |
299 |
| -
|
300 |
| - </div> |
301 |
| - </div> |
302 |
| - </section> --> |
303 |
| - <!-- Results --> |
304 |
| - |
305 | 214 | <!-- Acknowledgements -->
|
306 | 215 | <section class="hero is-light">
|
307 | 216 | <div class="container is-max-desktop">
|
|
0 commit comments