Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointing the best weights of model with reference to issue #23 #25

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

deadskull7
Copy link

Hey guillaume chevalier ! Its a great repo which helped many including me. Thanks for sharing it !
I just wanted to add some changes regarding the model checkpointing and restoring weights which has been raised as an issue #23 by JunqiZhao. I have made a separate directory named checkpoints to save the model's best weight by monitoring the validation loss or test loss. The weights would be saved in a step only if the test loss of that step is the least of all the steps done before. This has also been explained in the LSTM.ipynb file using comments thus stating clearly the explanation of the code written.
The corresponding changes have also been made to the readme.md file telling about the directory.

The line of codes added at different places are:
if not os.path.exists('checkpoints/'): os.makedirs('checkpoints/')

prev_loss = 100000 # prev_loss is the loss for the previous step, initialized with a big number # so as to make the condition true for the first step, considering test loss for # first step will not get greater than 100000.

``# Saving the model weights only when the batch loss of a step is least out of the all the steps done before.
# The least validation loss till now is stored in prev_loss
# 'loss' is the test loss of the step being done right now.

    if prev_loss > loss:
        prev_loss = loss
        saver.save(sess, './checkpoints/my_model')
        print("Model is saved")

``

The following image shows that when the batch loss increased from 1.10 to 1.14 from iter #60000 to #90000, the model didn't save the weights due to the fact that the least batch loss till now is still less than the current batch loss, therefore it didn't update itself to worse. In rest of the cases whenever the batch loss has been improved, the weights have been saved.

I hope you like it and merge it.
lstm training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant