Description
The current implementation in extract_feature.py extracts DINOv2 features from all views and averages them without checking if a 3D point is actually visible in each view. This leads to feature contamination when points are occluded or on back-facing surfaces. Won't this have a significant impact on the VAE's performance? Or are there other processing details I haven't noticed? Looking forward to your response, thank you.