-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hello, and thank you for sharing this impressive work on Dexterous World Model (DWM).
The idea of action-conditioned video diffusion for interactive digital twins is very exciting.
I have a few questions and a small feature request that I believe could benefit the community:
Pretrained Models
Are there plans to release pretrained checkpoints for DWM?
If so, will they include models trained on both the synthetic egocentric interaction data and real-world fixed-camera videos?
Inference Pipeline
Could you clarify the exact inference pipeline for generating interaction videos from:
a static 3D scene rendering sequence (camera trajectory), and
an egocentric hand motion / mesh sequence?
Any example scripts or configuration files for end-to-end inference would be very helpful.
Scene Generalization
Have you evaluated how well DWM generalizes to unseen 3D scenes or object layouts at inference time?
Are there recommended constraints on scene scale, object categories, or camera trajectories?
Future Extensions
Do you see DWM being extended to:
multi-hand or bimanual interactions,
non-rigid objects,
or physics-aware feedback loops?
I believe releasing pretrained models or a minimal demo would significantly accelerate adoption and follow-up research on interactive digital twins and embodied simulation.
Thanks again for the great work, and congratulations on the project!
Best regards