Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Skip the data preparation

  • Chat-Scene has provided all the prepared data in HuggingFace. Simply download the files and place them in the annotations/ directory. You’ll then be ready to run and test the code.

  • We've provided preprocessed VL-SAT features for semantic relations between objects as well as additional text annotations in Yandex Disk

  • We've provided VL-SAT features for fully-connected graphs with semantic relations between objects in Yandex Disk (output_vlsat.zip)

Prepare data

  • Download the ScanNet dataset by following the ScanNet instructions.

  • Extract object masks using a pretrained 3D detector:

    • Use Mask3D for instance segmentation. We used the checkpoint pretrained on ScanNet200.
    • The complete predicted results (especially the masks) for the train/validation sets are too large to share (~40GB). We’ve shared the post-processed results:
      • Unzip the mask3d_inst_seg.tar.gz file.
      • Each file under mask3d_inst_seg contains the predicted results for a single scene, including a list of segmented instances with their labels and segmented indices.
  • Process object masks and prepare annotations:

    • If you use Mask3D for instance segmentation, set the segment_result_dir in run_prepare.sh to the output directory of Mask3D.
    • If you use the downloaded mask3d_inst_seg directly, set segment_result_dir to None and set inst_seg_dir to the path of mask3d_inst_seg.
    • Run: bash preprocess/run_prepare.sh
  • Extract 3D features using a pretrained 3D encoder:

    • Follow Uni3D to extract 3D features for each instance. We used the pretrained model uni3d-g.
    • We've also provided modified code for feature extraction in this forked repository. Set the data_dir here to the path to ${processed_data_dir}/pcd_all (processed_data_dir is an intermediate directory set in run_prepare.sh). After preparing the environment, run bash scripts/inference.sh.
  • Extract 2D features using a pretrained 2D encoder:

    • We followed OpenScene's code to calculate the mapping between 3D points and 2D image pixels. This allows each object to be projected onto multi-view images. Based on the projected masks on the images, we extract and merge DINOv2 features from multi-view images for each object.

    • [TODO] Detailed implementation will be released.

  • Obtain connections based on the N nearest neighbors for each object, filter the fully connected graphs with VLSAT features for Mask3D segmentation. To achieve this, run the prepare_filtered_mask3d_gnn_data.py script after updating the paths to the directories containing the fully connected graphs for each scene, the object attributes, and the ScanNet splits. The number of nearest neighbors can be adjusted by modifying the KNN parameter at the beginning of the prepare_filtered_mask3d_gnn_data.py script.

  • Obtain connections based on the N nearest neighbors for each object, filter the fully connected graphs with VLSAT features for GT segmentation. To achieve this, run the prepare_gnn_data.py script after updating the paths to the directories containing the fully connected graphs for each scene, the object attributes, and the ScanNet splits. The number of nearest neighbors can be adjusted by modifying the KNN parameter at the beginning of the prepare_gnn_data.py script.