-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Hi @ZENGXH,
Thanks a lot for your very interesting work, and for making it open source.
I’m trying to re-train the stage-1 all-class VAE and had a few questions.
-
NaNs during stage-1 all-class training
I’m running into NaNs after 29 epochs (and it started to diverge after 16 epochs) when training the stage-1 all-class VAE (log file here)..
I used the command suggested in the config, with all default settings (default config file, batch size 32), on 4×A100 GPUs, using the ShapeNetCore15K data from PointFlow. I’m currently experimenting with some of the hyperparameter modifications suggested in NaN loss while training stage 1 VAE #47 , but I’m a bit surprised to see NaNs even with the default setup. Do you have any other suggestions about this? -
Motivation behind
normalize_shape_global=Falseandnormalize_shape_box=True
I’m curious about the design choice of usingnormalize_shape_global=Falseandnormalize_shape_box=True.
My eventual goal is to obtain size-and-shape-aware embeddings, so I’m wondering what happens if I instead train with:normalize_shape_global=True,normalize_shape_box=False?Does training still work in this case (perhaps with slightly worse reconstruction quality), or does it tend to break down entirely? Any intuition here would be very helpful.
-
Training time for the all-class model
I’m currently seeing ~20 epochs/hour on 4×A100s. With the default config specifying ~8000 epochs, this implies a fairly long training time.
In the supplementary material, I could only find training time details for the single-class model, not the all-class model.
Could you share roughly how long the all-class stage-1 model took to train in your setup? -
Viewing training / validation curves
Is there a built-in way to view training/validation plots (e.g., loss curves)?
At the moment, I only see checkpoints and log files being saved, but no plots. Is there a flag or logger option (e.g., TensorBoard/W&B) that I might be missing? -
Follow-Up Works
I was also wondering whether there are any interesting follow-up works to LION—either by you or by other groups—that you’re aware of.In particular, I’m interested in approaches involving a multi-class VAE trained with a reconstruction objective, which also incorporates size information. My goal is to have some pretrained multi-class size-and-shape aware embeddings, which i can then use for a downstream task.
Thanks again for releasing the code and for any guidance you can share.