-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
French to English translation task notebook #12
Comments
Hello! It appears that when using the model to make predictions on external input, the tokenization process may differ from what was used on the original input data. As a result, the model is unable to predict the correct output if the input is not from the dataset. Even if the same text (as external input) is mentioned in the dataset, still it is giving wrong prediction. Below is the code I used for external prediction, the data preprocessing process is same as for the input dataset
|
Yes, for that we can use a tokenizer built on llm from hugging face and then it will give better results. |
Hello hope your day is going great,
from keras.preprocessing.text import Tokenizer def tokenize(sentences): def pad(sequences, length): if this goes well you can also try to reshape your input as I hope it serves you well |
https://colab.research.google.com/drive/14KegLD0ymq4vTRzCjUvP77w9l-IGCsnj?usp=sharing
@mlevans @tejasvicsr1
The text was updated successfully, but these errors were encountered: