-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi @nicklashansen, thanks a lot for releasing Newt, MMBench, and the checkpoints.
I’m trying to run a sanity‑check evaluation of one of the per‑task checkpoints from the HF repo (https://huggingface.co/nicklashansen/newt/blob/main/walker-walk.pt) using the current code in this repo (commit d0cea95).
I ran:
cd tdmpc2
python train.py \
task=walker-walk \
model_size=B \
checkpoint=/path/to/hf_checkpoints/walker-walk.pt \
steps=1 \
num_envs=2 \
use_demos=False \
tasks_fp=/path/to/newt/tasks.json \
exp_name=eval_hf_walker_walk \
save_video=True \
env_mode=sync \
compile=FalseThis fails with:
AssertionError: pad should be positive
...
File "tdmpc2/common/layers.py", line 190, in api_model_conversion
assert pad > 0, 'pad should be positive'
I inspected the HF checkpoint:
state = torch.load("hf_checkpoints/walker-walk.pt", map_location="cpu", weights_only=False)
state = state["model"]
print("_task_emb.weight", state["_task_emb.weight"].shape) # torch.Size([10, 512])
print("_action_masks", state["_action_masks"].shape) # torch.Size([10, 7])
print("_encoder.state.0.weight", state["_encoder.state.0.weight"].shape) # torch.Size([256, 554])
print("_dynamics.0.weight", state["_dynamics.0.weight"].shape) # torch.Size([512, 1031])From this I infer the HF walker-walk.pt was trained with something like:
model_size = B(latent_dim=512, enc_dim=256, mlp_dim=512),task_dim = 512(10 tasks × 512‑dim embedding),obs_state_dim = 42(since 554 = 42 + 512),action_dim = 7(since 1031 = 512 (latent) + 7 (action) + 512 (task)).
However, the current single‑task path in this repo builds a model with:
task != "soup"→task_dim = 0inparse_cfg,- padded state
obs_shape['state'] = (128,)viaVecWrapper, - padded actions
(16,)viaVecWrapper.
So the local _encoder.state.0.weight is [256, 128], which is smaller than the HF [256, 554], and api_model_conversion’s assumption (“target has more input channels than source, pad source”) is violated, hence pad < 0 and the assertion.
Environment: Docker image built from this repo’s docker/Dockerfile.
Questions
-
What is the exact config used to train the per‑task HF checkpoints, e.g.
walker-walk.pt?model_size?obs(statevsrgbvsstate+rgb)?task_dim?- How many tasks / which tasks correspond to the 10 rows in
_task_emb.weightand_action_masks?
-
What is the recommended way to evaluate these HF per‑task checkpoints with this implementation?
- Is there a matching config/script you use internally (e.g. a specific
train.pyinvocation or a separate eval script)? - Should we be using a multitask (
task="soup"or similar) config withtask_dim=512and then fix the task index at eval time, rather than the currenttask_dim=0single‑task path?
- Is there a matching config/script you use internally (e.g. a specific
-
More generally: are the HF single‑task
.ptfiles intended to be evaluated with this early code release as‑is, or should we wait for a dedicated evaluation script / config that matches those checkpoints?
Any guidance or an example eval command for walker-walk.pt (or any other per‑task checkpoint) would be greatly appreciated.