Skip to content

[ADD] PyTorch: Tolstoi Char RNN#40

Open
schaefertim wants to merge 17 commits intofsschneider:developfrom
schaefertim:develop-tolstoi
Open

[ADD] PyTorch: Tolstoi Char RNN#40
schaefertim wants to merge 17 commits intofsschneider:developfrom
schaefertim:develop-tolstoi

Conversation

@schaefertim
Copy link

Introduces a PyTorch version of the Tolstoi Char RNN (previously only in Tensorflow).

Comment on lines +24 to +28
Working training parameters are:

- batch size ``50``
- ``200`` epochs
- SGD with a learning rate of :math:`\\approx 0.1` works
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from tensorflow and not verified.
In Tensorflow, the CrossEntropyLoss takes mean across time axis and sum across batch axis.
Such an option does not exist in PyTorch. The only options are "sum" or "mean" for both axes. Currently "mean" is chosen.
In this case, the learning rate should be a factor batch_size bigger, because gradients are a factor batch_size smaller.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you instead use "sum" and divide by seq_length (or whatever the variable name for the width of the time axis is)?
It would be great if running, e.g. SGD with lr=0.1 produced similar results in PyTorch and TensorFlow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the idea we discussed in person. However, it turns out that this didn't work.
The division by seq_length must happen only after the CrossEntropyLoss. Therefore, it cannot be part of the model.
I see two possibilities:

  • introduce a custom CrossEntropyLoss, that divides after applying CrossEntropyLoss.
  • leave it as it is

In my opinion, both options are quite bad.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest would be to change the definition of the Loss in the TensorFlow version to something compatible with PyTorch...
Let me think about this and I will address and merge it once I find time for DeepOBS again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants