Can you provide more details about the dataset dict #5

ascetic-monk · 2025-01-08T08:40:08Z

Thanks for your work. I found out that the results are quite sensitive to the randomly selected order of the novel classes, can you provide more details about the online_dataset_dict.txt'' of the 4 datasets for better alignment? Additionally, I also want to confirm that it is the soft'' result instead of the ``hard'' one shown in your paper.

The text was updated successfully, but these errors were encountered:

mashijie1028 · 2025-01-08T18:58:55Z

Hi!

For generic datasets (e.g., C100, IN100, Tiny-ImageNet), the class order follows the natural order. For fine-grained datasets (e.g., CUB, and Aircraft), we use ssb_splits (https://github.com/sgvaze/SSB).

Specifically, for C100 and IN100, the stage-0 labeled classes are 0-49. Then the model continually learns classes 50-99. For Tiny, the labeled classes are 0-99, while the unlabeled ones are 100-199. Importantly, we use np.random.seed(0) to shuffle the unlabeled classes for all datasets, but not for the labeled datasets. (for example here)

Additionally, we first subsample 100 classes in IN-1k to obtain IN100, also with np.random.seed(0), as shown here.

In short, you could reproduce the reported results for C100, IN100 and Tiny using the released code. For CUB, please refer to this issue. We have released the logs. Here, we use ssb_splits for CUB, please adjust the released code in get_datasets.py.

mashijie1028 · 2025-01-08T19:01:16Z

By the way, what do 'soft' and 'hard' results mean?

ascetic-monk · 2025-01-09T07:29:24Z

Thank you for your reply. We are concerned that the random numbers may differ across different hardware, so we hope to achieve this alignment. Additionally, we have actually conducted multiple reproductions and found that the fluctuations in the "all" metric are relatively small, while the fluctuations in the "unseen" metric are larger. We will continue to try adjusting the random seed in PyTorch to reproduce results with the current settings.

In line 297 of ``train_happy.py'', you listed two metrics: "hard" and "soft." According to our understanding, these correspond to calculations under two different partitions. We would like to know if the main experimental results you reported are based on the "soft" metric.

mashijie1028 · 2025-01-10T08:00:21Z

Hi! Thanks for your reminder!

Yes, in our paper, we report the "soft" metric. "hard" only treats initially labeled classes as old (Stage-0). "soft" dynamically treats the firstly seen classes at each stage as new, while all seen classes (including initial classes and previously discovered new classes) as old. As a result, "soft" reflects the dynamic nature of continual learning thus more rational.

As for the fluctuations in the "unseen" metric, we attribute the unstable results to the evaluation protocol of category discovery (clustering). Conventionally, GCD implements the Hungarian algorithm for all classes once to obtain the best All Acc, at different learning steps, the optimal correspondence produced by the Hungarian algorithm might be different, leading to fluctuations in the "unseen". I think this is the inherent issue in GCD. You could run the experiments several times and report the average results.

mashijie1028 · 2025-01-10T08:02:14Z

As for the detailed dataset dicts, currently, there are some issues in the GPU server.

I will upload them later, please stay tuned.

ascetic-monk · 2025-01-13T07:36:55Z

As for the detailed dataset dicts, currently, there are some issues in the GPU server.

I will upload them later, please stay tuned.

Thanks a lot. By the way, I have found another reason for the instability of the unseen results from the experiments these days. As noted by you, the results are selected according to the best score for all accuracy but the unseen score contributes little to all accuracy. Therefore, the ``best'' score often occurs in the early stage of training where the new classes cannot achieve sufficient learning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you provide more details about the dataset dict #5

Can you provide more details about the dataset dict #5

ascetic-monk commented Jan 8, 2025

mashijie1028 commented Jan 8, 2025 •

edited

Loading

mashijie1028 commented Jan 8, 2025

ascetic-monk commented Jan 9, 2025

mashijie1028 commented Jan 10, 2025

mashijie1028 commented Jan 10, 2025

ascetic-monk commented Jan 13, 2025

Can you provide more details about the dataset dict #5

Can you provide more details about the dataset dict #5

Comments

ascetic-monk commented Jan 8, 2025

mashijie1028 commented Jan 8, 2025 • edited Loading

mashijie1028 commented Jan 8, 2025

ascetic-monk commented Jan 9, 2025

mashijie1028 commented Jan 10, 2025

mashijie1028 commented Jan 10, 2025

ascetic-monk commented Jan 13, 2025

mashijie1028 commented Jan 8, 2025 •

edited

Loading