You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been observing that for models that take a large amount of steps to reach the early stopping criteria (~20k+ steps), increasing the learning rate significantly (5e-5 --> 2e-4) often cuts the number of steps needed in half, which in turn cuts the training time in half. For models that take less steps to begin with, an increased learning rate can also reduce the number of steps needed, but that is less often the case. The score metrics do not seem to be significantly affected by the learning rate.
To do:
Use some hyperparameter optimization tool (ClearML, Weights and Biases?) to see if there is a learning rate that consistently reduces the training time for Scripture projects using NLLB
Experiment with different learning rate schedules
The text was updated successfully, but these errors were encountered:
I've noticed it for both, but I've run a lot more experiments with LoRA/other model reduction methods than without, so I will need to get some more data points before I'm more confident about the types of scenarios that benefit from a higher learning rate. This issue is meant to be focusing on fully fine-tuned models, since the default learning rate for LoRA models has already been updated to be higher.
I've been observing that for models that take a large amount of steps to reach the early stopping criteria (~20k+ steps), increasing the learning rate significantly (5e-5 --> 2e-4) often cuts the number of steps needed in half, which in turn cuts the training time in half. For models that take less steps to begin with, an increased learning rate can also reduce the number of steps needed, but that is less often the case. The score metrics do not seem to be significantly affected by the learning rate.
To do:
The text was updated successfully, but these errors were encountered: