Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions roboverse_learn/il/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# RoboVerse Imitation Learning (IL) Policies

## Example Usage

Pick a policy folder and follow its README for setup and usage.

Example:

```bash
# From the repo root
cd roboverse_learn/il/dp # or fm/, vita/ depending on the policy
pip install -r requirements.txt
cd ../../..

# Run policy training and evaluation (example: diffusion policy, DiT backbone)
bash roboverse_learn/il/il_run.sh --task_name_set close_box --algo_choose ddpm_dit
```

We keep each policy as self-contained as possible (code, dependencies, docs) and only share the minimum common abstractions.

## Troubleshooting

```bash
# Fix potential package version issues
bash roboverse_learn/il/il_setup.sh
```

## Supported Algorithms

| Name | Policy | Backbone | Model Config | Ref |
| --- | --- | --- | --- | --- |
| `ddpm_dit` | Diffusion Policy (DDPM) | DiT | `model_config/ddpm_dit_model.yaml` | [1], [5] |
| `fm_dit` | Flow Matching | DiT | `model_config/fm_dit_model.yaml` | [6], [5] |
| `vita` | VITA Policy | MLP | `model_config/vita_model.yaml` | [7] |
| `ddpm_unet` | Diffusion Policy (DDPM) | UNet | `model_config/ddpm_model.yaml` | [1], [4] |
| `ddim_unet` | Diffusion Policy (DDIM) | UNet | `model_config/ddim_model.yaml` | [2], [4] |
| `fm_unet` | Flow Matching | UNet | `model_config/fm_unet_model.yaml` | [6] |
| `score_unet` | Score-Based Model | UNet | `model_config/score_model.yaml` | [3], [4] |

### References

1. Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising Diffusion Probabilistic Models." (2020).
2. Song, Jiaming, Chenlin Meng, and Stefano Ermon. "Denoising Diffusion Implicit Models." (2021).
3. Song, Yang, et al. "Score-Based Generative Modeling through Stochastic Differential Equations." (2021).
4. Chi, Cheng, et al. "Diffusion Policy: Diffusion Models for Robotic Manipulation." (2023).
5. Peebles, William, and Jun-Yan Zhu. "DiT: Diffusion Models with Transformers." (2023).
6. Lipman, Yaron, et al. "Flow Matching for Generative Modeling." (2023).
7. Gao, Dechen, et al. "VITA: Vision-to-Action Flow Matching Policy." (2025).
2 changes: 1 addition & 1 deletion roboverse_learn/il/act/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import h5py
import json
from torch.utils.data import TensorDataset, DataLoader
from roboverse_learn.il.utils.common.replay_buffer import ReplayBuffer
from roboverse_learn.il.utils.replay_buffer import ReplayBuffer

import IPython
e = IPython.embed
Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

import torch
import torch.nn
from diffusion_policy.model.common.normalizer import LinearNormalizer

from roboverse_learn.il.utils.normalizer import LinearNormalizer


class BaseLowdimDataset(torch.utils.data.Dataset):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from dp.runner.base_policy import BasePolicyCfg
from roboverse_learn.il.runner.base_policy import BasePolicyCfg

try:
from curobo.types.math import Pose
Expand All @@ -12,7 +12,7 @@
import torch
from loguru import logger as log
from metasim.scenario.scenario import ScenarioCfg
from roboverse_learn.il.utils.common.pytorch_util import dict_apply
from roboverse_learn.il.utils.pytorch_util import dict_apply


class BaseEvalRunner:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from typing import Dict

import torch
from diffusion_policy.model.common.module_attr_mixin import ModuleAttrMixin
from diffusion_policy.model.common.normalizer import LinearNormalizer
from roboverse_learn.il.utils.module_attr_mixin import ModuleAttrMixin
from roboverse_learn.il.utils.normalizer import LinearNormalizer


class BaseImagePolicy(ModuleAttrMixin):
Expand Down
Empty file.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_target_: dp.datasets.robot_image_dataset.RobotImageDataset
_target_: roboverse_learn.il.datasets.robot_image_dataset.RobotImageDataset
zarr_path: data_policy/useless.zarr
horizon: ${horizon}
pad_before: ${eval:'${n_obs_steps}-1'}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
defaults:
- _self_
- dataset_config: robot_image_dataset
- model_config: ${oc.env:algo_model,ddpm_model} # diffusion_policy_model/fm_model/DDIM_model
- model_config: ${oc.env:algo_model,ddpm_dit_model}
- eval_config: diffusion_policy_eval
- train_config: diffusion_policy_train

task_name: placeholder
name: robot_${task_name}
_target_: dp.runner.dp_runner.DPRunner
_target_: roboverse_learn.il.runner.dp_runner.DPRunner


image_shape: &image_shape [3, 256, 256]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
eval_args:
_target_: roboverse_learn.il.utils.common.eval_args.Args
_target_: roboverse_learn.il.utils.eval_args.Args
# random:
# _target_: metasim.cfg.randomization.RandomizationCfg
# level: 0
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_target_: dp.models.ddim_unet_image_policy.DiffusionUnetImagePolicy
_target_: roboverse_learn.il.dp.policies.ddim_unet_image_policy.DiffusionUnetImagePolicy

shape_meta: ${shape_meta}

Expand All @@ -13,10 +13,10 @@ noise_scheduler:
prediction_type: epsilon # or sample

obs_encoder:
_target_: diffusion_policy.model.vision.multi_image_obs_encoder.MultiImageObsEncoder
_target_: roboverse_learn.il.dp.models.vision.multi_image_obs_encoder.MultiImageObsEncoder
shape_meta: ${shape_meta}
rgb_model:
_target_: diffusion_policy.model.vision.model_getter.get_resnet
_target_: roboverse_learn.il.dp.models.vision.model_getter.get_resnet
name: resnet18
weights: null
resize_shape: null
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_target_: dp.models.ddpm_dit_image_policy.DiffusionDiTImagePolicy
_target_: roboverse_learn.il.dp.policies.ddpm_dit_image_policy.DiffusionDiTImagePolicy

shape_meta: ${shape_meta}

Expand All @@ -13,10 +13,10 @@ noise_scheduler:
prediction_type: epsilon # or sample

obs_encoder:
_target_: diffusion_policy.model.vision.multi_image_obs_encoder.MultiImageObsEncoder
_target_: roboverse_learn.il.dp.models.vision.multi_image_obs_encoder.MultiImageObsEncoder
shape_meta: ${shape_meta}
rgb_model:
_target_: diffusion_policy.model.vision.model_getter.get_resnet
_target_: roboverse_learn.il.dp.models.vision.model_getter.get_resnet
name: resnet18
weights: null
resize_shape: null
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_target_: dp.models.ddpm_unet_image_policy.DiffusionUnetImagePolicy
_target_: roboverse_learn.il.dp.policies.ddpm_unet_image_policy.DiffusionUnetImagePolicy

shape_meta: ${shape_meta}

Expand All @@ -13,10 +13,10 @@ noise_scheduler:
prediction_type: epsilon # or sample

obs_encoder:
_target_: diffusion_policy.model.vision.multi_image_obs_encoder.MultiImageObsEncoder
_target_: roboverse_learn.il.dp.models.vision.multi_image_obs_encoder.MultiImageObsEncoder
shape_meta: ${shape_meta}
rgb_model:
_target_: diffusion_policy.model.vision.model_getter.get_resnet
_target_: roboverse_learn.il.dp.models.vision.model_getter.get_resnet
name: resnet18
weights: null
resize_shape: null
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
_target_: dp.models.fm_dit_image_policy.FlowMatchingDiTImagePolicy
_target_: roboverse_learn.il.fm.policies.fm_dit_image_policy.FlowMatchingDiTImagePolicy


shape_meta: ${shape_meta}

obs_encoder:
_target_: diffusion_policy.model.vision.multi_image_obs_encoder.MultiImageObsEncoder
_target_: roboverse_learn.il.dp.models.vision.multi_image_obs_encoder.MultiImageObsEncoder
shape_meta: ${shape_meta}
rgb_model:
_target_: diffusion_policy.model.vision.model_getter.get_resnet
_target_: roboverse_learn.il.dp.models.vision.model_getter.get_resnet
name: resnet18
weights: null
resize_shape: null
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
_target_: dp.models.fm_unet_image_policy.FlowMatchingUnetImagePolicy
_target_: roboverse_learn.il.fm.policies.fm_unet_image_policy.FlowMatchingUnetImagePolicy


shape_meta: ${shape_meta}

obs_encoder:
_target_: diffusion_policy.model.vision.multi_image_obs_encoder.MultiImageObsEncoder
_target_: roboverse_learn.il.dp.models.vision.multi_image_obs_encoder.MultiImageObsEncoder
shape_meta: ${shape_meta}
rgb_model:
_target_: diffusion_policy.model.vision.model_getter.get_resnet
_target_: roboverse_learn.il.dp.models.vision.model_getter.get_resnet
name: resnet18
weights: null
resize_shape: null
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_target_: dp.models.score_unet_image_policy.ScoreMatchingUnetImagePolicy
_target_: roboverse_learn.il.dp.policies.score_unet_image_policy.ScoreMatchingUnetImagePolicy

shape_meta: ${shape_meta}

Expand All @@ -13,10 +13,10 @@ noise_scheduler:
prediction_type: epsilon # or sample

obs_encoder:
_target_: diffusion_policy.model.vision.multi_image_obs_encoder.MultiImageObsEncoder
_target_: roboverse_learn.il.dp.models.vision.multi_image_obs_encoder.MultiImageObsEncoder
shape_meta: ${shape_meta}
rgb_model:
_target_: diffusion_policy.model.vision.model_getter.get_resnet
_target_: roboverse_learn.il.dp.models.vision.model_getter.get_resnet
name: resnet18
weights: null
resize_shape: null
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
_target_: dp.models.vita_policy.VITAImagePolicy
_target_: roboverse_learn.il.vita.policies.vita_policy.VITAImagePolicy


shape_meta: ${shape_meta}

obs_encoder:
_target_: diffusion_policy.model.vision.multi_image_obs_encoder.MultiImageObsEncoder
_target_: roboverse_learn.il.dp.models.vision.multi_image_obs_encoder.MultiImageObsEncoder
shape_meta: ${shape_meta}
rgb_model:
_target_: diffusion_policy.model.vision.model_getter.get_resnet
_target_: roboverse_learn.il.dp.models.vision.model_getter.get_resnet
name: resnet18
weights: null
resize_shape: null
Expand All @@ -31,7 +31,7 @@ latent_dim: 512

# Flow matcher parameters
flow_matcher:
_target_: diffusion_policy.common.flow_matchers.ExactOptimalTransportConditionalFlowMatcher
_target_: roboverse_learn.il.utils.flow_matchers.ExactOptimalTransportConditionalFlowMatcher
sigma: 0.0
num_sampling_steps: 6

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ val_dataloader:
persistent_workers: False

ema:
_target_: diffusion_policy.model.diffusion.ema_model.EMAModel
_target_: roboverse_learn.il.dp.models.diffusion.ema_model.EMAModel
update_after_step: 0
inv_gamma: 1.0
power: 0.75
Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

import torch
import torch.nn

from roboverse_learn.il.utils.common.normalizer import LinearNormalizer
from roboverse_learn.il.utils.normalizer import LinearNormalizer


class BaseLowdimDataset(torch.utils.data.Dataset):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,17 @@
import numba
import numpy as np
import torch
from termcolor import cprint

from dp.base.base_dataset import BaseImageDataset
from roboverse_learn.il.utils.common.normalize_util import get_image_range_normalizer
from roboverse_learn.il.utils.common.normalizer import LinearNormalizer
from roboverse_learn.il.utils.common.pytorch_util import dict_apply
from roboverse_learn.il.utils.common.replay_buffer import ReplayBuffer
from roboverse_learn.il.utils.common.sampler import (
from roboverse_learn.il.utils.normalize_util import get_image_range_normalizer
from roboverse_learn.il.utils.pytorch_util import dict_apply
from roboverse_learn.il.utils.replay_buffer import ReplayBuffer
from roboverse_learn.il.utils.sampler import (
SequenceSampler,
downsample_mask,
get_val_mask,
)
from roboverse_learn.il.base.base_dataset import BaseImageDataset
from roboverse_learn.il.utils.normalizer import LinearNormalizer
from termcolor import cprint


class RobotImageDataset(BaseImageDataset):
Expand All @@ -30,6 +29,7 @@ def __init__(
batch_size=64,
max_train_episodes=None,
):

super().__init__()

self.replay_buffer = ReplayBuffer.copy_from_path(
Expand Down
33 changes: 3 additions & 30 deletions roboverse_learn/il/dp/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,10 @@
# Flow Matching and Diffusion Based IL Policies
# Diffusion Policy

## 1. Install

```bash
cd roboverse_learn/il/utils/diffusion_policy

pip install -e .

cd ../../../../

pip install pandas wandb
cd roboverse_learn/il/dp
pip install -r requirements.txt
```

Register for a Weights & Biases (wandb) account to obtain an API key.
Expand Down Expand Up @@ -39,25 +34,3 @@ eval_enable=False
train_enable=False
eval_enable=True
```

## Supported Algorithms

| Algorithm | Backbone | Model Config | Ref |
| --- | --- | --- | --- |
| Diffusion Policy (DDPM) | DiT | `model_config/ddpm_dit_model.yaml` | [1], [5] |
| Flow Matching | DiT | `model_config/fm_dit_model.yaml` | [6], [5] |
| VITA Policy | MLP | `model_config/vita_model.yaml` | [7] |
| Diffusion Policy (DDPM) | UNet | `model_config/ddpm_model.yaml` | [1], [4] |
| Diffusion Policy (DDIM) | UNet | `model_config/ddim_model.yaml` | [2], [4] |
| Flow Matching | UNet | `model_config/fm_unet_model.yaml` | [6] |
| Score-Based Model | UNet | `model_config/score_model.yaml` | [3], [4] |

### References

1. Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising Diffusion Probabilistic Models." (2020).
2. Song, Jiaming, Chenlin Meng, and Stefano Ermon. "Denoising Diffusion Implicit Models." (2021).
3. Song, Yang, et al. "Score-Based Generative Modeling through Stochastic Differential Equations." (2021).
4. Chi, Cheng, et al. "Diffusion Policy: Diffusion Models for Robotic Manipulation." (2023).
5. Peebles, William, and Jun-Yan Zhu. "DiT: Diffusion Models with Transformers." (2023).
6. Lipman, Yaron, et al. "Flow Matching for Generative Modeling." (2023).
7. Gao, Dechen, et al. "VITA: Vision-to-Action Flow Matching Policy." (2025).
Empty file.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import abc
from typing import Optional, Union

import diffusion_policy.model.bet.utils as utils
import roboverse_learn.il.dp.models.bet.utils as utils
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import numpy as np
import torch
import tqdm
from diffusion_policy.model.common.dict_of_tensor_mixin import DictOfTensorMixin
from roboverse_learn.il.utils.dict_of_tensor_mixin import DictOfTensorMixin


class KMeansDiscretizer(DictOfTensorMixin):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import abc
from typing import Optional, Tuple

import diffusion_policy.model.bet.utils as utils
import roboverse_learn.il.dp.models.bet.utils as utils
import torch


Expand Down
Loading