Paper‘s result cannot be reproduced #12

LISI0037 · 2024-12-31T08:30:50Z

Thank you for your excellent work! However, when we tried to reproduce the results reported in your paper, we can't reprodece the paper's results.Here are the details of our attempt and the problem we met:

1.We did not adjust any training parameters and used the exact configurations provided in the MiDiffusion/config/ yaml files, including epoch, learning rate, and other settings. Are there any additional tricks or adjustments required during training?

2.For the PointNet feature extractor, should it be a pretrained version, or is it intended to be trained from scratch?

3.For dataset preprocessing, we directly used the files from the ThreedFront dataset. Are there any specific preprocessing steps or modifications needed that are not mentioned in the paper?

4.Even when we used the pretrained weights provided by you for evaluation, we were unable to replicate the results in the paper, particularly for the FID metric, where we observed a significant difference. Could you provide any suggestions?

The data of ATISS and DiffuScene are provided by the Midiffusion paper.
Pretrained is the weight you provide in the github.
Train by us is the weight we trained.

Thank you for your help.

SiyiHu · 2025-01-03T00:00:45Z

[Setup]
You should not modify any config or data files to reproduce MiDiffusion results. All preprocessing steps are included in the ThreedFront repository and we do not modify anything in ThreedFront/dataset_files. Please make sure you complete the last step which samples the floor plan boundary if you want to follow the default setup. Both the released model weights and the config files in /config use PointNet as the floor plan feature extractor.
The released model weights are trained using the attached config files, which are identical to those in /config. We released these files with the weights to make sure they can be loaded properly in case there are any changes to /config in the future.

[Results]
The pretrained models should yield results very close to what we reported in the paper. They might not be identical due to random floor plan sampling. We also observed minor differences when evaluating the same models across different machines. However, the differences (due to sampling, library versions, etc.) are very small that we reach the same conclusions when comparing against ATISS, DiffuScene and the ablation studies.
For the pre-trained weights, the only issue from your results here seems to be the FID. FID should be computed using the same library and the same sets of input images as KID. Since KID numbers are quite close to ours, I suspect there is some issue with the number of images that you use for evaluation. FID is much more sensitive to the number of images than KID by design. You should compare 1000 synthetic images against 162/177/192 real images in bedroom/diningroom/livingroom datasets respectively.
For your trained models, you can try evaluating the last model (i.e. at 50k epochs for bedrooms, and 100k epochs for dining/living rooms). These models will overfit and we found that the weights would stabilize in training. We have trained our models with different random seed and the results are pretty consistent.

LISI0037 · 2025-01-05T05:51:59Z

Thank you for your reply, yes, the only probelm is FID. And we found the last model's evaluation is better than the best model and very close to pretrained model, it maybe the overfitting thing. I've checked the number of image that used for evaluation, the number it's correct. And the script used for get FID is Threedfront, we did not change that, do you have any other advice for FID's problem? Still can't solve the FID problem.

LISI0037 · 2025-01-06T03:04:15Z

sorry, the last step which samples the floor plan boundary, is this step?

SiyiHu · 2025-01-06T05:59:17Z

sorry, the last step which samples the floor plan boundary, is this step?

Yes.

SiyiHu · 2025-01-06T06:05:17Z

I can't tell what might be wrong here. It is strange that KID results are close but FID are not given the same inputs. I can run the evaluation script on my side if you send me an example set of 1000 synthetic layout images.

LISI0037 · 2025-01-06T06:16:45Z

Thank you very much for your help and patience. I just ran it again and found that there is still a problem with the FID. And the file is too large to upload on github, so I sent the livingroom synthetic layout images to your email. Hope you can help me check what is wrong with my evaluation operation.

LISI0037 · 2025-01-08T04:41:09Z

And we use headless rendering, I don't know if this will cause some problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper‘s result cannot be reproduced #12

Paper‘s result cannot be reproduced #12

LISI0037 commented Dec 31, 2024

SiyiHu commented Jan 3, 2025

LISI0037 commented Jan 5, 2025

LISI0037 commented Jan 6, 2025

SiyiHu commented Jan 6, 2025

SiyiHu commented Jan 6, 2025

LISI0037 commented Jan 6, 2025

LISI0037 commented Jan 8, 2025

Paper‘s result cannot be reproduced #12

Paper‘s result cannot be reproduced #12

Comments

LISI0037 commented Dec 31, 2024

SiyiHu commented Jan 3, 2025

LISI0037 commented Jan 5, 2025

LISI0037 commented Jan 6, 2025

SiyiHu commented Jan 6, 2025

SiyiHu commented Jan 6, 2025

LISI0037 commented Jan 6, 2025

LISI0037 commented Jan 8, 2025