-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to reproduce the results #12
Comments
Thank you for your interest in our work! You are using different numbers of GPUs, resulting in different batch sizes, so you need to adjust the learning rate! |
Thanks for your timely reply, and sorry for the missing information about my training device. I noticed that you train the model on 8 x 3090 (in the readme), which takes about 17 hours, and I train on 4 devices. However, if I just change the Specifically, in DsHmp/configs/dshmp_swin_tiny.yaml Lines 67 to 81 in d0c3a39
So I'm not very sure about the "adjust the learning rate" since each device just only run 1 sample, which means batch size is 1 on each device all the time. Or you mean the lr is calculated by the "global batch" instead of the device "local batch"? Can you give me more instructions on how to fix this? Thanks again~ |
Additionally, I just noticed the In a short, the if I get it right, it's better to set Some questions about "use less than 8 devices for training" in this issue may no more a problem However, I just add If there is anything wrong with the above, please let me know, or any other advice in the process. |
The results turns out to be: offline:
online:
< 0.46 (reported in paper) |
I try to reproduce the results with the following cmd, modified from the readme:
Then inference with the cmd:
I check the offline score, on
valid_u
:And the online score, on
valid
, it turns out here:The official ckpt turns out the online score, which is higher than my ckpt trained with the cmd in readme:

I am wondering what's wrong with my training process, I just follow the instruction in readme, but fail to reproduce the results
The text was updated successfully, but these errors were encountered: