Similar sentence Prediction with more accurate results with your dataset on top of BERT pertained model.
Install the package
pip install similar-sentences
- FilePath: Reference to model.zip for prediction. Reference to sentences.txt for training.
- Type:
predict
ortrain
- Used for training the setences. Which required
(".txt", "train")
as parameter in SimilarSentences - PreTrainedModel (optional): Any of the below model can be passed for training,by default #1 will be applied
- bert-base-nli-mean-tokens: BERT-base model with mean-tokens pooling. Performance: STSbenchmark: 77.12
- bert-base-nli-max-tokens: BERT-base with max-tokens pooling. Performance: STSbenchmark: 77.21
- bert-base-nli-cls-token: BERT-base with cls token pooling. Performance: STSbenchmark: 76.30
- bert-large-nli-mean-tokens: BERT-large with mean-tokens pooling. Performance: STSbenchmark: 79.19
- bert-large-nli-max-tokens: BERT-large with max-tokens pooling. Performance: STSbenchmark: 78.41
- bert-large-nli-cls-token: BERT-large with CLS token pooling. Performance: STSbenchmark: 78.29
- roberta-base-nli-mean-tokens: RoBERTa-base with mean-tokens pooling. Performance: STSbenchmark: 77.49
- roberta-large-nli-mean-tokens: RoBERTa-base with mean-tokens pooling. Performance: STSbenchmark: 78.69
- distilbert-base-nli-mean-tokens: DistilBERT-base with mean-tokens pooling. Performance: STSbenchmark: 76.97
- Used for predicting the setences. Which required
(".zip", "predict")
as parameter in SimilarSentences - InputSentences: To find the similar sentence for.
- NumberOfPrediction: Number of results for the prediction
- DesiredJsonOutput: The output will be in JSON format.
simple
produces a plain output.detailed
produces detailed output with score
- Used for reloading (or) updating the model. Which required
(".zip", "predict")
as parameter in SimilarSentences
- This method will export the data with 3 columns in excel format. Columns ['Sentence','Suggestion','Score']
- BatchFile: Batch file with sentences to predict, has to be in .txt format.
- NumberOfPrediction: Number of results for the prediction
Prepare your dataset and save the content to sentences.txt
Hi, thanks for contacting.
Hello there!
Hi there, welcome!
Hi, how can I help?
In a few words, how can help?
Hi again, welcome back.
Hi! Welcome back.
Good morning!
Good afternoon!
Good evening!
Good morning! Welcome.
Good afternoon! Welcome.
Good evening! Welcome.
Hello, how can I help?
Welcome.
Welcome back.
Thanks for contacting.
Goodbye!
Thanks for contacting. Goodbye!
Thanks for contacting. Bye!
Happy to help!
Glad I could help!
Supply the sentences to build the model.
from SimilarSentences import SimilarSentences
# Make sure the extension is .txt
model = SimilarSentences('sentences.txt',"train")
model.train()
The code snipet will produce model.zip.
Load the model.zip from the training.
from SimilarSentences import SimilarSentences
model = SimilarSentences('model.zip',"predict")
text = 'Hi.How are you doing?'
simple = model.predict(text, 2, "simple")
detailed = model.predict(text, 2, "detailed")
print(simple)
print(detailed)
Output looks like,
#simple output
[
"Hello there! Did I get that right?",
"Right Hi, how can I help?"
]
#detailed output
[
[
{
"sentence": "Hello there!",
"score": 0.938870553799856
},
{
"sentence": "Did I get that right?",
"score": 0.7910412586610753
}
],
[
{
"sentence": "Right",
"score": 0.9161810654762793
},
{
"sentence": "Hi, how can I help?",
"score": 0.7824734658953297
}
]
]
👍 ✨ 🐫 🎉 🚀 🤘 HAPPY CODING 🤘 🚀 🎉 🐫 ✨ 👍