Request for Pretrained Checkpoints & Inference Pipeline Details

Hello, and thank you for sharing this impressive work on Dexterous World Model (DWM).
The idea of action-conditioned video diffusion for interactive digital twins is very exciting.

I have a few questions and a small feature request that I believe could benefit the community:

Pretrained Models

Are there plans to release pretrained checkpoints for DWM?

If so, will they include models trained on both the synthetic egocentric interaction data and real-world fixed-camera videos?

Inference Pipeline

Could you clarify the exact inference pipeline for generating interaction videos from:

a static 3D scene rendering sequence (camera trajectory), and

an egocentric hand motion / mesh sequence?

Any example scripts or configuration files for end-to-end inference would be very helpful.

Scene Generalization

Have you evaluated how well DWM generalizes to unseen 3D scenes or object layouts at inference time?

Are there recommended constraints on scene scale, object categories, or camera trajectories?

Future Extensions

Do you see DWM being extended to:

multi-hand or bimanual interactions,

non-rigid objects,

or physics-aware feedback loops?

I believe releasing pretrained models or a minimal demo would significantly accelerate adoption and follow-up research on interactive digital twins and embodied simulation.

Thanks again for the great work, and congratulations on the project!

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Pretrained Checkpoints & Inference Pipeline Details #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request for Pretrained Checkpoints & Inference Pipeline Details #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions