Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
LiCHOTHU authored Apr 2, 2024
1 parent 836a32a commit 37d8d06
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ We also formulate our data collection algorithm here.

![data_collection_algo](https://github.com/pairlab/actaim2-eccv24/assets/30140814/3032710a-ac79-400e-99c9-8d23ea881806)

## Mode Selector
## Unsupervised Mode Selector Learning

In this part, we show how we train and infer from the mode selector to extract the discrete task embedding for action predictor training. Our mode selector is a VAE-style generative model but replacing the simple Gaussian with the Mixture of Gaussian.

Expand All @@ -51,6 +51,11 @@ In the inference phase, the agent discretely samples a cluster from the trained

This disentanglement visualization with CGMVAE illustrates the efficacy of the Conditional Gaussian Mixture Variational Autoencoder (CGMVAE) in disentangling interaction modes for the "single drawer" object (ID: 20411), using a t-SNE plot for visualization. Task embeddings $\epsilon_j$, defined by the variance between initial and final object states, are visualized in distinct colors to denote various interaction modes and clusters. The sequence of figures demonstrates the CGMVAE's precision in clustering and aligning data points with their respective interaction modes: (1) Generated clusters from the CGMVAE mode selector reveal distinct groupings. (2) Ground truth task embeddings confirm the model's capacity for accurate interaction mode classification. (3) A combined visualization underscores the alignment between generated clusters and ground truth, showcasing the model's ability to consistently categorize tasks within identical interaction modes.

## Supervised Action Predictor Learning

[fig3.pdf](https://github.com/pairlab/actaim2-eccv24/files/14842055/fig3.pdf)

Interaction mode $\epsilon$ is sampled from latent space embedding from the model selector. Multiview RGBD observations are back-projected and fused into a color point cloud. Novel views are rendered by projecting the point cloud onto orthogonal image planes. Rendered image tokens and interaction mode tokens are contacted and fed through the multiview transformer. This output consists of global feature for rotation $\mathbf{R}$ and gripper state $\mathbf{q}$ estimation and 2D per-view heatmap for position $\mathbf{p}$ prediction.



Expand Down

0 comments on commit 37d8d06

Please sign in to comment.