Training Transformer on Small Dataset #16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR trains the Transformer model on a small dataset (50-100 words) to validate the forward and backward passes, loss function, and optimization. The goal is to observe the initial learning behavior and check for overfitting.
Dataset
A small story is used, tokenized into sequences:
Input:
"Once upon a time, in a land far away, there was a small village."
Target:
"The villagers were known for their kindness and generosity."
Training Process
Expected Outcome
Conclusion
This small-scale training helps verify the core functionality before scaling to larger datasets.