You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to make results consistent between different runs via seeding(seed, torch_deterministic=True) .
It is known torch has some broadcasting issue with deterministic algorithm: pytorch/pytorch#79987
So, I manually fix the broadcasting in each environment. e.g. In the envs/ant Line 204-206, I change the code to self.state.joint_q.view(self.num_envs, -1)[env_ids, 3:7] = self.start_rotation.clone().unsqueeze(0).expand(len(env_ids), -1) self.state.joint_q.view(self.num_envs, -1)[env_ids, 7:] = self.start_joint_q.clone().unsqueeze(0).expand(len(env_ids), -1) self.state.joint_qd.view(self.num_envs, -1)[env_ids, :] = torch.zeros(size=(len(env_ids), self.num_joint_qd), device = self.device)
After these changes, I run the experiments with/without torch_deterministic=True, e.g. below is the ant test where the blue one is without torch_deterministic=True and orange one with torch_deterministic=True
The non-deterministic run is similar to the paper results, however, for the deterministic setting, the rewards remain unchanged.
Does someone have ideas about what else the issue may torch_deterministic=True bring? Thank you very much!
The text was updated successfully, but these errors were encountered:
Thank you for providing this awesome repo!
I try to make results consistent between different runs via
seeding(seed, torch_deterministic=True)
.It is known torch has some broadcasting issue with deterministic algorithm: pytorch/pytorch#79987
So, I manually fix the broadcasting in each environment. e.g. In the envs/ant Line 204-206, I change the code to
self.state.joint_q.view(self.num_envs, -1)[env_ids, 3:7] = self.start_rotation.clone().unsqueeze(0).expand(len(env_ids), -1) self.state.joint_q.view(self.num_envs, -1)[env_ids, 7:] = self.start_joint_q.clone().unsqueeze(0).expand(len(env_ids), -1) self.state.joint_qd.view(self.num_envs, -1)[env_ids, :] = torch.zeros(size=(len(env_ids), self.num_joint_qd), device = self.device)
After these changes, I run the experiments with/without
torch_deterministic=True
, e.g. below is the ant test where the blue one is withouttorch_deterministic=True
and orange one withtorch_deterministic=True
The non-deterministic run is similar to the paper results, however, for the deterministic setting, the rewards remain unchanged.
Does someone have ideas about what else the issue may
torch_deterministic=True
bring? Thank you very much!The text was updated successfully, but these errors were encountered: