-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you provide more details about the dataset dict #5
Comments
Hi! For generic datasets (e.g., C100, IN100, Tiny-ImageNet), the class order follows the natural order. For fine-grained datasets (e.g., CUB, and Aircraft), we use Specifically, for C100 and IN100, the stage-0 labeled classes are 0-49. Then the model continually learns classes 50-99. For Tiny, the labeled classes are 0-99, while the unlabeled ones are 100-199. Importantly, we use Additionally, we first subsample 100 classes in IN-1k to obtain IN100, also with In short, you could reproduce the reported results for C100, IN100 and Tiny using the released code. For CUB, please refer to this issue. We have released the logs. Here, we use |
By the way, what do 'soft' and 'hard' results mean? |
Thank you for your reply. We are concerned that the random numbers may differ across different hardware, so we hope to achieve this alignment. Additionally, we have actually conducted multiple reproductions and found that the fluctuations in the "all" metric are relatively small, while the fluctuations in the "unseen" metric are larger. We will continue to try adjusting the random seed in PyTorch to reproduce results with the current settings. In line 297 of ``train_happy.py'', you listed two metrics: "hard" and "soft." According to our understanding, these correspond to calculations under two different partitions. We would like to know if the main experimental results you reported are based on the "soft" metric. |
Hi! Thanks for your reminder! Yes, in our paper, we report the "soft" metric. "hard" only treats initially labeled classes as old (Stage-0). "soft" dynamically treats the firstly seen classes at each stage as new, while all seen classes (including initial classes and previously discovered new classes) as old. As a result, "soft" reflects the dynamic nature of continual learning thus more rational. As for the fluctuations in the "unseen" metric, we attribute the unstable results to the evaluation protocol of category discovery (clustering). Conventionally, GCD implements the Hungarian algorithm for all classes once to obtain the best All Acc, at different learning steps, the optimal correspondence produced by the Hungarian algorithm might be different, leading to fluctuations in the "unseen". I think this is the inherent issue in GCD. You could run the experiments several times and report the average results. |
As for the detailed dataset dicts, currently, there are some issues in the GPU server. I will upload them later, please stay tuned. |
Thanks a lot. By the way, I have found another reason for the instability of the unseen results from the experiments these days. As noted by you, the results are selected according to the best score for all accuracy but the unseen score contributes little to all accuracy. Therefore, the ``best'' score often occurs in the early stage of training where the new classes cannot achieve sufficient learning. |
Thanks for your work. I found out that the results are quite sensitive to the randomly selected order of the novel classes, can you provide more details about the
online_dataset_dict.txt'' of the 4 datasets for better alignment? Additionally, I also want to confirm that it is the
soft'' result instead of the ``hard'' one shown in your paper.The text was updated successfully, but these errors were encountered: