Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper‘s result cannot be reproduced #12

Open
LISI0037 opened this issue Dec 31, 2024 · 7 comments
Open

Paper‘s result cannot be reproduced #12

LISI0037 opened this issue Dec 31, 2024 · 7 comments

Comments

@LISI0037
Copy link

Thank you for your excellent work! However, when we tried to reproduce the results reported in your paper, we can't reprodece the paper's results.Here are the details of our attempt and the problem we met:

1.We did not adjust any training parameters and used the exact configurations provided in the MiDiffusion/config/ yaml files, including epoch, learning rate, and other settings. Are there any additional tricks or adjustments required during training?

2.For the PointNet feature extractor, should it be a pretrained version, or is it intended to be trained from scratch?

3.For dataset preprocessing, we directly used the files from the ThreedFront dataset. Are there any specific preprocessing steps or modifications needed that are not mentioned in the paper?

4.Even when we used the pretrained weights provided by you for evaluation, we were unable to replicate the results in the paper, particularly for the FID metric, where we observed a significant difference. Could you provide any suggestions?

The data of ATISS and DiffuScene are provided by the Midiffusion paper.
Pretrained is the weight you provide in the github.
Train by us is the weight we trained.

issue

Thank you for your help.

@SiyiHu
Copy link
Collaborator

SiyiHu commented Jan 3, 2025

[Setup]
You should not modify any config or data files to reproduce MiDiffusion results. All preprocessing steps are included in the ThreedFront repository and we do not modify anything in ThreedFront/dataset_files. Please make sure you complete the last step which samples the floor plan boundary if you want to follow the default setup. Both the released model weights and the config files in /config use PointNet as the floor plan feature extractor.
The released model weights are trained using the attached config files, which are identical to those in /config. We released these files with the weights to make sure they can be loaded properly in case there are any changes to /config in the future.

[Results]
The pretrained models should yield results very close to what we reported in the paper. They might not be identical due to random floor plan sampling. We also observed minor differences when evaluating the same models across different machines. However, the differences (due to sampling, library versions, etc.) are very small that we reach the same conclusions when comparing against ATISS, DiffuScene and the ablation studies.
For the pre-trained weights, the only issue from your results here seems to be the FID. FID should be computed using the same library and the same sets of input images as KID. Since KID numbers are quite close to ours, I suspect there is some issue with the number of images that you use for evaluation. FID is much more sensitive to the number of images than KID by design. You should compare 1000 synthetic images against 162/177/192 real images in bedroom/diningroom/livingroom datasets respectively.
For your trained models, you can try evaluating the last model (i.e. at 50k epochs for bedrooms, and 100k epochs for dining/living rooms). These models will overfit and we found that the weights would stabilize in training. We have trained our models with different random seed and the results are pretty consistent.

@LISI0037
Copy link
Author

LISI0037 commented Jan 5, 2025

Thank you for your reply, yes, the only probelm is FID. And we found the last model's evaluation is better than the best model and very close to pretrained model, it maybe the overfitting thing. I've checked the number of image that used for evaluation, the number it's correct. And the script used for get FID is Threedfront, we did not change that, do you have any other advice for FID's problem? Still can't solve the FID problem.

@LISI0037
Copy link
Author

LISI0037 commented Jan 6, 2025

sorry, the last step which samples the floor plan boundary, is this step?
image

@SiyiHu
Copy link
Collaborator

SiyiHu commented Jan 6, 2025

sorry, the last step which samples the floor plan boundary, is this step? image

Yes.

@SiyiHu
Copy link
Collaborator

SiyiHu commented Jan 6, 2025

I can't tell what might be wrong here. It is strange that KID results are close but FID are not given the same inputs. I can run the evaluation script on my side if you send me an example set of 1000 synthetic layout images.

@LISI0037
Copy link
Author

LISI0037 commented Jan 6, 2025

Thank you very much for your help and patience. I just ran it again and found that there is still a problem with the FID. And the file is too large to upload on github, so I sent the livingroom synthetic layout images to your email. Hope you can help me check what is wrong with my evaluation operation.

@LISI0037
Copy link
Author

LISI0037 commented Jan 8, 2025

And we use headless rendering, I don't know if this will cause some problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants