Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPN logit #3

Open
Zzang-yeah opened this issue Aug 23, 2024 · 8 comments
Open

FPN logit #3

Zzang-yeah opened this issue Aug 23, 2024 · 8 comments

Comments

@Zzang-yeah
Copy link

When and where is the FPN logit stored?
Whether online or offline, tools/demo.py just runs and generates an image, but no npy file is created, so I can't proceed with student learning.

@Zzang-yeah
Copy link
Author

I fixed the above problem
In yolo_head.py, on line 316, you need to add -f to yolox_command and expfile to yolox_command
But I have another problem, the processes don't seem to run in order when multi-gpu.
The npy file is not saved and I keep getting file not found after loading it.
I think I need to modify the code to synchronize the processes.

@martinaub
Copy link
Collaborator

Hi @Zzang-yeah, thank you for your interest in our repo.
To save images and FPN logits, you should ensure the parameters in both student and teacher files are correct.
For instance, for saving the FPN teacher logits, you should set to true the self.KD parameter in the teacher file. In addition, for online KD, you should set to true the self.KD_online and self.KD in the student file. In the case of online KD, the FPN and images are saved at every epoch and deleted after the KD loss is completed for the past epoch in a way that does not require too much space while training.
About the multi-gpu training, I do not know since I am only using a single GPU.
Let me know if it works with the proper parameters :)

@martinaub martinaub reopened this Aug 28, 2024
@Zzang-yeah
Copy link
Author

When learning with multi-GPU, the online learning didn't seem to work well because the order between processes was messed up.
It was trying to load the npy file before creating the npy file, which caused a file not found error and training stopped.
I am now switching to offline training and it is working fine.

I have one question: I understand that in online learning, the teacher model is augmented and saved logit for every iteration to KD train the student model, and in offline learning, the student model is KD trained by running Teacher_Inference.py with the teacher model and running augmentation and save logit only once.
Doesn't this make any difference between online and offline learning?
I ask because my guess is that online learning with KD at every iteration will perform better, but I don't remember it being mentioned in the paper.

@martinaub
Copy link
Collaborator

Thank you for raising this concern; I thought it would have been obvious, but maybe not.
Because of computational power limitation, we introduced the offline KD, drastically reducing the time of KD training. Offline KD means that the model does not rely on the online data augmentation provided by the original YOLOX model. Instead, the model relies only on the pre-defined dataset. To highlight the difference in training between both methods (online data augmentation and non-online data augmentation), we show in our paper the no-Aug models, in which the metrics result without the data augmentation. By comparing the other model (e.g., YOLOX-L with YOLO-L-noAug), you will see a big difference in object detection metrics, where the L model is way better than the L-noAug. This result highlights the utility of online data augmentation during training and showcases that online KD would perform better than offline KD.
However, online data augmentation is performed randomly while training the model. Thus, when launching the KD method, we do not know how will look like the online augmentated dataset for the training, resulting in launching the inference teacher at every iteration for each augmentated data.
In addition, you can still train the teacher with online data augmentation, and then the teacher model can transfer better knowledge into the student model.
Thus, offline KD does not perform as well as online KD; however, as shown in the result metrics from the paper, the model is still improved.
If you have access to multiple GPUs, it should be faster for you to run the online KD.

@Zzang-yeah
Copy link
Author

When comparing nano models trained with KD to nano models trained without, we found that the performance was not significantly better. Does the fact that the model with KD performed better mean that the FP was improved? In my experiments, the AP was higher for the nano model without KD than for the model with KD.

@zxccsssd
Copy link

zxccsssd commented Oct 9, 2024

@Zzang-yeah

I fixed the above problem In yolo_head.py, on line 316, you need to add -f to yolox_command and expfile to yolox_command But I have another problem, the processes don't seem to run in order when multi-gpu. The npy file is not saved and I keep getting file not found after loading it. I think I need to modify the code to synchronize the processes.

I encountered the same issue when using a single GPU. I tried the method you provided, but I still get an error saying the npy file was not found. May I take a look at your modified code?

@xiaohongzaizhe
Copy link

@Zzang-yeah

I fixed the above problem In yolo_head.py, on line 316, you need to add -f to yolox_command and expfile to yolox_command But I have another problem, the processes don't seem to run in order when multi-gpu. The npy file is not saved and I keep getting file not found after loading it. I think I need to modify the code to synchronize the processes.

I encountered the same issue when using a single GPU. I tried the method you provided, but I still get an error saying the npy file was not found. May I take a look at your modified code?

我已经解决了

@zxccsssd
Copy link

@Zzang-yeah

I fixed the above problem In yolo_head.py, on line 316, you need to add -f to yolox_command and expfile to yolox_command But I have another problem, the processes don't seem to run in order when multi-gpu. The npy file is not saved and I keep getting file not found after loading it. I think I need to modify the code to synchronize the processes.

I encountered the same issue when using a single GPU. I tried the method you provided, but I still get an error saying the npy file was not found. May I take a look at your modified code?

我已经解决了

大佬,怎么解决的?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants