[ADD] PyTorch: Tolstoi Char RNN#40
Conversation
| Working training parameters are: | ||
|
|
||
| - batch size ``50`` | ||
| - ``200`` epochs | ||
| - SGD with a learning rate of :math:`\\approx 0.1` works |
There was a problem hiding this comment.
This is copied from tensorflow and not verified.
In Tensorflow, the CrossEntropyLoss takes mean across time axis and sum across batch axis.
Such an option does not exist in PyTorch. The only options are "sum" or "mean" for both axes. Currently "mean" is chosen.
In this case, the learning rate should be a factor batch_size bigger, because gradients are a factor batch_size smaller.
There was a problem hiding this comment.
Could you instead use "sum" and divide by seq_length (or whatever the variable name for the width of the time axis is)?
It would be great if running, e.g. SGD with lr=0.1 produced similar results in PyTorch and TensorFlow.
There was a problem hiding this comment.
This is exactly the idea we discussed in person. However, it turns out that this didn't work.
The division by seq_length must happen only after the CrossEntropyLoss. Therefore, it cannot be part of the model.
I see two possibilities:
- introduce a custom CrossEntropyLoss, that divides after applying CrossEntropyLoss.
- leave it as it is
In my opinion, both options are quite bad.
There was a problem hiding this comment.
Easiest would be to change the definition of the Loss in the TensorFlow version to something compatible with PyTorch...
Let me think about this and I will address and merge it once I find time for DeepOBS again.
Introduces a PyTorch version of the Tolstoi Char RNN (previously only in Tensorflow).