@ZENGXH
While working with the VAE model, we observed that the z_local latent appears to directly encode the full point cloud geometry. This happens because the decoder receives the original point coordinates (x) via skip connections, allowing the network to simply copy the input through the latent. As a result, the latent space does not generalize well, and reconstructing from random z_local yields no meaningful shape. We confirmed this by removing the skip connection (+x) in the decoder and observed a collapse in reconstruction unless z_local retained the original coordinates. This behavior makes it challenging to evaluate or use z_local as a meaningful latent representation. Would appreciate any insights on how this was handled during original training, or if this design was intentional.