-
Notifications
You must be signed in to change notification settings - Fork 60
Description
I trained dexbotic/playground/benchmarks/libero/libero-pi0.py on the latest version of the dexbotic codebase using libero_pi0_all data. And tested with dexbotic-benchmark, the results are as follows:
db_pi0_all_ft paper | 97 | 98.2 | 94 | 86.4 | 93.9
db_pi0_all_ft test | 96.6 | 95.6 | 93.4 | 87.2 | 93.2
my_db_pi0_all_ft | 93.4 | 96.8 | 86.6 | 73.2 | 87.5
The "db_pi0_all_ft paper" is the official result provided. The "db_pi0_all_ft test" is the result I obtained using the official provided model (libero-pi0) for testing. The "my_db_pi0_all_ft" is the result of my retraining and testing as required.
It can be seen that there is a gap between the results and the official points, and the difference in the results obtained from retraining is even greater. My initial guess is that the reason is due to the use of a different code version, as I noticed that the model's config. json provided by the official source is different from the config. json I retrained. The retrained config. json has many additional fields on top of the original config. json, and I noticed that the transformer version is different.
Official model: "transformers_version": "4.38.0. dev0".
Current version (my model): "transformers_version": "4.51.0".
The official code pyproject.toml for PI0's transformer is 4.53.2.
Could this be the reason for this? The training parameter settings are completely consistent with the provided code, and the testing parameters are consistent with dexbotic-benchmark/evaluation/config/libero/sample_pi0ulibero.yaml.
I also conducted several experiments to change the replan_step parameter, but the results were inconsistent. 15 would be better than 5. Why is this