ModelCheckpoint Callback not working unless `save_on_train_epoch_end` is enabled True #20195

snknitin · 2024-08-13T14:18:47Z

snknitin
Aug 13, 2024

I'm using the pytorch-lightning + hydra template for my Custom RL project. Model basically runs super fast but if i enable checkpointing it is super slow. almost 4x time. There must be something wrong in my setting or the way I am doing this, and after wracking my brain for the whole day and running 30 experiments with different trial and error combinations I am lost.

The loggers are set appropriately to show me env_step, episode level metrics based on on_step and on_epoch parameters. Everything works perfect except the checkpointing and early stopping too. Never triggers.

HELP NEEDED : To figure out how to make this work with just the save_last checkpoint flag and create the model checkpoint directory and save the last one run instead of checking each step, delaying it and then picking the best checkpoint. Since it is RL i don't expect reward monitoring to show any degradation after convergence

Context

There is a caveat here.

I only have a train_step and no other hooks except on_train_start. When i start my trainer and set max_epochs=1000, that also means my trainer/global_step will go till 1000 and train_step is called 1000 times. on_train_star is just called once. So each trainstep is one epoch
Each train_step = one env_step and also a batch sampled(size 200) from the buffer for loss calculation. if my episode terminates after 100 steps, then 1000 epochs = 1000 steps = 1000 buffer updates = 10 episodes = 1000 batches sampled for training.

Problem

The checkpoint folder doesn't get created at all without the save_on_train_epoch_end: True.
If i set it to false, it runs super fast but doesn't save any checkpoint even if save_last:True
If i set it to true and have save_last also true then i get a checkpoints folder and 2 checkpoints for the best and last
If I enable the save_on_train_epoch_end and try to use every_n_epochs and matched with trainer : check_val_every_n_epochs it still goes very slow(same time) but gives me the 99th check point of 499 or 999 depending on choosing 100,500,1000 rather than monitoring the log I chose and giving me the best one, which i think is expected,
I do not have an on_train_epoch_end but i created one and moved the log i am monitoring into that, but it gives the same behavior and is not creating a checkpoint unless save_on_train_epoch_end is True.
creating a val_step is of no use cause without a val_dataloader it skips and if i copy the train_dataloader and name it val it jus thros errors in sanity check and isn't working.

Basically I do not now how to get it to trigger and save the checkpoint without the flag save_on_train_epoch_end and how to get it to be fast and not check every epoch which in my case is every time step because i need to run this for 100000 epochs/steps. If atleast i can save a checkpoint every 10000 steps even if i am not monitoring and getting the best model with the "moving avg reward across 10 episodes", that is fine cause eventually it converges and there's not much difference

BEST CASE : If i can get it to create a checkpoint directory and just save the last epoch without having to use the epoch end flag and slow down the whole training and experimentation.

Code Snippets

This is the callback config I use.

defaults:
  - model_checkpoint
  - early_stopping
  - model_summary
  - rich_progress_bar
  - _self_

model_checkpoint:
  dirpath: ${paths.output_dir}/checkpoints
  filename: "epoch_{epoch:03d}"
  monitor: "train/Moving_avg_5_ep_reward"
  mode: "max"
  save_last: True
  every_n_epochs: 100
  auto_insert_metric_name: False
  save_on_train_epoch_end: True

early_stopping:
  monitor: "total_reward"
  min_delta: 10
  patience: 3
  mode: "max"

model_summary:
  max_depth: -1

and my LightningModule is basically along the lines of

class DQNLightning(pl.LightningModule):
    """ Basic DQN Model """

    def __init__(self,env: str, seed: int, net: DQN, target_net: DQN, buffer,optimizer,
                 eps_start: float, eps_end: float, eps_last_frame: int,
                 sync_rate: int, lr: float, gamma: float, warm_start_steps:int,
                 episode_length: int,batch_size: int) -> None:
        super().__init__()

        self.save_hyperparameters()

        # self.env = gym.make(self.hparams.env)
        self.env = gym.make(self.hparams.env['id'], env_cfg=self.hparams.env['env_cfg'])

        # Need to set these seeds too
        self.env.action_space.seed(self.hparams.seed)
        self.env.observation_space.seed(self.hparams.seed)

        # obs_size = self.env.observation_space.shape[0]
        # n_actions = self.env.action_space.n

        self.net = self.hparams.net
        self.target_net = self.hparams.target_net

        self.buffer = self.hparams.buffer
        self.agent = Agent(self.env, self.buffer,self.hparams.seed)

        self.total_reward = 0
        self.episode_reward = 0


        self.lr = lr
        print(self.lr)

        # Custom counters
        self.episode_count = 0
        self.step_count = 0

        self.populate(self.hparams.warm_start_steps)

        # Metrics
        # for averaging loss across batches - epochs/episodes
        self.train_loss = MeanMetric()
        self.avg_episodic_reward = MeanMetric()

        self.episode_reward = SumMetric()
        self.episode_rewards = [0]
        self.cumulative_step_reward = 0

        self.episode_length = SumMetric()


    def populate(self, steps: int = 1000) -> None:
        """
        Carries out several random steps through the environment to initially fill
        up the replay buffer with experiences
        Args:
            steps: number of random steps to populate the buffer with
        """
        for i in range(steps):
            self.agent.play_step(self.net, epsilon=1.0)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Passes in a state x through the network and gets the q_values of each action as an output
        Args:
            x: environment state
        Returns:
            q values
        """
        output = self.net(x)
        return output

    def dqn_mse_loss(self, batch) -> torch.Tensor: # : Tuple[torch.Tensor, torch.Tensor]
        """
        Calculates the mse loss using a mini batch from the replay buffer
        Args:
            batch: current mini batch of replay data
        Returns:
            loss
        """
        states, actions, rewards, dones, next_states = batch
        # Convert actions to torch.int64
        # Ensure actions are long and have the right shape
        actions = actions.long().unsqueeze(-1)  # Shape: (150, 1)

        # Get current Q values
        current_q_values = self.net(states)  # Shape should be (150, n)

        # Select the Q values for the actions taken
        # Since we only have 1 action outputed we only have 1 q value - change action_dim to 20
        current_q_values = current_q_values.gather(1, actions)  # Shape: (150, 1)
        # Compute target Q values
        with torch.no_grad():
            next_q_values = self.target_net(next_states).max(1)[0]
            next_q_values[dones] = 0.0
            next_q_values = next_q_values.detach()
            target_q_values = rewards + self.hparams.gamma * next_q_values

        return nn.MSELoss()(current_q_values.squeeze(), target_q_values)

    def on_train_start(self):
        # To ensure every run is proper from start to finish and not mid-way from populate buffer
        self.env.reset()

    def training_step(self, batch, nb_batch): # : Tuple[torch.Tensor, torch.Tensor]
        """
        Carries out a single step through the environment to update the replay buffer.
        Then calculates loss based on the minibatch recieved
        Args:
            batch: current mini batch of replay data
            nb_batch: batch number
        Returns:
            Training loss and log metrics
        """
        self.step_count += 1
        epsilon = max(self.hparams.eps_end, self.hparams.eps_start -
                      self.step_count + 1 / self.hparams.eps_last_frame)

        # step through environment with agent
        reward, done, info = self.agent.play_step(self.net, epsilon, self.device)
        # self.episode_reward += reward

        # calculates training loss
        loss = self.dqn_mse_loss(batch)
        # update loss and log
        self.train_loss(loss)
        # Update both reward metrics
        self.cumulative_step_reward += reward.item()
        self.episode_reward(reward)
        self.avg_episodic_reward(reward)

        # Log moving averages
        window_size = min(5, len(self.episode_rewards))
        mavg_reward = sum(self.episode_rewards[-window_size:]) / window_size
        cumulative_episode_reward = sum(self.episode_rewards)

        self.log("train/loss", self.train_loss, on_step=False, on_epoch=True, prog_bar=True)
        self.log("train/cumulative_step_reward", self.cumulative_step_reward, on_step=False, on_epoch=True)
        self.log("train/cumulative_episodic_reward", cumulative_episode_reward, on_step=False, on_epoch=True)
        self.log("train/Moving_avg_5_ep_reward", mavg_reward, on_step=False, on_epoch=True)


        # Update metrics
        self.episode_length(1)  # Increment by 1 for each step

        # if self.trainer.use_dp or self.trainer.use_ddp2:
        #     loss = loss.unsqueeze(0)

        # Soft update of target network
        if self.step_count % self.hparams.sync_rate == 0:
            self.target_net.load_state_dict(self.net.state_dict())

        # total_reward = torch.tensor(self.total_reward,dtype=torch.float32).to(self.device)
        reward = torch.tensor(reward,dtype=torch.float32).to(self.device).item()
        steps = torch.tensor(self.global_step).to(self.device)

        log = {'loss': loss,
               'reward': reward,
               'steps': steps}


        # Log step-level metrics
        # Log environment step metrics
        self.log("env_step/reward", reward, on_step=False, on_epoch=True, prog_bar=False)
        self.log("env_step/loss", loss.item(), on_step=False, on_epoch=True, prog_bar=True)


        # End of episode
        if done:
            self.episode_count += 1
            episode_reward = self.episode_reward.compute()
            episode_avg_reward = self.avg_episodic_reward.compute()
            self.episode_rewards.append(episode_reward.item())


            # Log episode-level metrics only when an episode is done
            self.log("episode/total_reward", episode_reward.item(), on_step=False, on_epoch=True, prog_bar=True)
            self.log("episode/avg_reward", episode_avg_reward.item(), on_step=False, on_epoch=True, prog_bar=True)

            self.log("episode/length", self.episode_length.compute(), on_step=False, on_epoch=True)
   
            # Log custom episode count
            self.log("episode/count", self.episode_count, on_step=False, on_epoch=True)

            # Reset metrics for next episode
            self.episode_reward.reset()
            self.episode_length.reset()

            # Reset the environment for the next episode
            self.env.reset()

        return {'loss': loss, 'reward': reward}


    def configure_optimizers(self):
        optimizer = self.hparams.optimizer(self.parameters(),lr=self.lr)
        lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,self.trainer.max_epochs,0)
        return [optimizer] , [lr_scheduler]

    def train_dataloader(self) -> DataLoader:
        """Initialize the Replay Buffer dataset used for retrieving experiences"""
        dataset = RLDataset(self.buffer, self.hparams.episode_length)
        dataloader = DataLoader(dataset=dataset,
                                batch_size=self.hparams.batch_size,
                                # num_workers=4,  # Adjust this based on your system
                                # pin_memory=True if torch.cuda.is_available() else False,
                                # persistent_workers = True
                                )
        return dataloader

    def get_device(self, batch) -> str:
        """Retrieve device currently being used by minibatch"""
        return batch[0].device.index if self.on_gpu else 'cpu'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelCheckpoint Callback not working unless `save_on_train_epoch_end` is enabled True #20195

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

ModelCheckpoint Callback not working unless save_on_train_epoch_end is enabled True #20195

snknitin Aug 13, 2024

Context

Problem

Code Snippets

Replies: 0 comments

ModelCheckpoint Callback not working unless `save_on_train_epoch_end` is enabled True #20195

snknitin
Aug 13, 2024