Skip to content

Conversation

@yawen-d
Copy link
Contributor

@yawen-d yawen-d commented Aug 11, 2022

Description

Closes #523.

Problem

  • I personally found logging videos during training is really useful as another dimension of explaining experiment results.
  • Concretely, this issue advocates for adding support for saving videos of policies on a environment for evaluation during and after training, including scripts.train_rl, scripts.train_preference_comparisons, scripts.train_adversarial and scripts.train_bc.
  • Also, it would be nice to add support for uploading saved videos to Weights & Biases during and after training.

Solution

  • Write a record_and_save_video() function in imitation.util.video_wrapper that takes in a policy, eval_venv, and a logger to save the video of a policy evaluated on an environment to a designated path.
def record_and_save_video(
    output_dir: str,
    policy: policies.BasePolicy,
    eval_venv: vec_env.VecEnv,
    video_kwargs: Mapping[str, Any],
    logger: Optional[sb_logger.Logger] = None,
) -> None:
    ...
  • Upload the video to weights & biases within WandbOutputFormat.write() by adding the following:
if key != "video":
    self.wandb_module.log({key: value}, step=step)
else:
    self.wandb_module.log({"video": self.wandb_module.Video(value)})

Testing

  • Add video_saving tests in tests/scripts/test_scripts.py
  • Add video uploading test in tests/util/test_wb_logger.py

@yawen-d yawen-d changed the title add video saving and uploading support to add train_* scripts add video saving and uploading support to train_* scripts Aug 11, 2022
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look, only skimmed as still in draft mode. Seems like a useful feature, couple of suggestions.

@codecov
Copy link

codecov bot commented Aug 23, 2022

Codecov Report

Merging #524 (9b9ea3d) into master (de36306) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 9b9ea3d differs from pull request most recent head 5ca705f. Consider uploading reports for the commit 5ca705f to get more accurate results

@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
- Coverage   96.95%   96.93%   -0.03%     
==========================================
  Files          84       84              
  Lines        7460     7369      -91     
==========================================
- Hits         7233     7143      -90     
+ Misses        227      226       -1     
Impacted Files Coverage Δ
src/imitation/algorithms/preference_comparisons.py 98.98% <100.00%> (-0.19%) ⬇️
src/imitation/scripts/common/common.py 97.22% <100.00%> (ø)
src/imitation/scripts/common/train.py 100.00% <100.00%> (ø)
...ion/scripts/config/train_preference_comparisons.py 84.72% <100.00%> (-0.62%) ⬇️
src/imitation/scripts/train_adversarial.py 96.29% <100.00%> (+1.62%) ⬆️
src/imitation/scripts/train_imitation.py 94.11% <100.00%> (+0.17%) ⬆️
.../imitation/scripts/train_preference_comparisons.py 98.38% <100.00%> (+0.02%) ⬆️
src/imitation/scripts/train_rl.py 100.00% <100.00%> (ø)
src/imitation/util/logger.py 100.00% <100.00%> (ø)
src/imitation/util/video_wrapper.py 100.00% <100.00%> (ø)
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@yawen-d yawen-d marked this pull request as ready for review August 23, 2022 07:11
@yawen-d yawen-d requested a review from AdamGleave August 23, 2022 07:11
)
callback_objs.append(save_policy_callback)

if _config["train"]["videos"]:
Copy link
Contributor Author

@yawen-d yawen-d Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to init a video_wrapper.SaveVideoCallback instead of using train.save_video like other scripts do. A bit unsatisfying.

An alternative could be passing a save_video partial function into the callback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is strange, why is that the case? I would advocate for using the callback class everywhere or using a partial / closure+wrapper defined in this file for this specific instance. Currently the existence of the class is confusing and not documented.

rl_algo.set_logger(custom_logger)
rl_algo.learn(total_timesteps, callback=callback)

with common.make_venv(num_vec=1, log_dir=None) as eval_venv:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an eval_venv

  • with num_vec=1.
  • without having creating monitors from here by setting log_dir=None.

@AdamGleave AdamGleave requested a review from Rocamonde September 2, 2022 05:44
@AdamGleave
Copy link
Member

I'm still a bit backlogged, @Rocamonde could you review this please?

total_timesteps = int(1e6) # total number of environment timesteps
total_comparisons = 5000 # total number of comparisons to elicit
num_iterations = 5 # Arbitrary, should be tuned for the task
num_iterations = 50 # Arbitrary, should be tuned for the task
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this has been discussed, but why are you doing this?

cross_entropy_loss_kwargs = {}
reward_trainer_kwargs = {
"epochs": 3,
"weight_decay": 0.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to remember changing this as I have a PR that replaces weight decay with a general regularization API (#481). @AdamGleave what do you think, should we merge my PR or this one first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to merge your PR first, though really depends which one is ready earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#481 is ready and passing all the tests AFAIK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing #481. #481 seems to be the feature wanted. I'll make changes accordingly.

)
callback_objs.append(save_policy_callback)

if _config["train"]["videos"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is strange, why is that the case? I would advocate for using the callback class everywhere or using a partial / closure+wrapper defined in this file for this specific instance. Currently the existence of the class is confusing and not documented.

total_timesteps: int,
total_comparisons: int,
callback: Optional[Callable[[int], None]] = None,
callback: Optional[Callable[[int, int], None]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should add in the docstring what the callback type signature represents.



@train_ingredient.capture
def save_video(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you call this function it self-documents as if the video were always saved. (but a flag indicating whether this should happen is magically injected through a decorator). I don't have an immediately better alternative, but perhaps a more explanatory function name could help.

round_str: str,
) -> None:
"""Save discriminator and generator."""
save_path = os.path.join(log_dir, "checkpoints", round_str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR for replacing os.path with pathlib in most places, but might as well keep it consistent for now until that's merged.

"""
super().__init__(env)
self.episode_id = 0
self._episode_id = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make it private?

directory=video_dir,
**(video_kwargs or dict()),
)
sample_until = rollout.make_sample_until(min_timesteps=None, min_episodes=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand where the name of this function is coming from ("make the function called sample_until"), but how it actually reads IMO is "make the sample (until...?)". I think that refactoring this to something like "get_stopping_conditions_callback" or "get_sampling_termination_fn" would be much more readable.

sample_until = rollout.make_sample_until(min_timesteps=None, min_episodes=1)
# video.{:06}.mp4".format(VideoWrapper.episode_id) will be saved within
# rollout.generate_trajectories()
rollout.generate_trajectories(policy, video_venv, sample_until)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I was expecting that the video that would be saved would be one of the real training trajectories instead of a newly sampled one.

@AdamGleave
Copy link
Member

Closing in favor of #597

@AdamGleave AdamGleave closed this Oct 28, 2022
@AdamGleave AdamGleave deleted the yawen-d/feature/video-saving-during-training branch November 3, 2022 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for saving videos of policies on a environment for evaluation during and after training

4 participants