Package to calculate the similarity score between two sentences.
sentence-similarity prefers python 3.6 or higher.
pip install sentence-similarity
from sentence_similarity import sentence_similarity
sentence_a = "paris is a beautiful city"
sentence_b = "paris is a gorgeous city"
You can access some of the official model through the sentence_similarity
class. However, you can directly type the HuggingFace's model name such as bert-base-uncased
or distilbert-base-uncased
when instantiating a sentence_similarity
.
See all the available models at huggingface.co/models.
model=sentence_similarity(model_name='distilbert-base-uncased',embedding_type='cls_token_embedding')
BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.
Set embedding_type to cls_token_embedding
, To compute the similarity score between two sentences based on [CLS] token.
paper link (https://arxiv.org/pdf/1810.04805.pdf)
score=model.get_score(sentence_a,sentence_b,metric="cosine")
print(score)
Available metric are euclidean, manhattan, minkowski, cosine score.
from sentence_similarity import sentence_similarity
sentence_a = "paris is a beautiful city"
sentence_b = "paris is a gorgeous city"
You can access all the pretrained models of Sentence-Transformers
See all the available models at sbert/models.
model=sentence_similarity(model_name='distilbert-base-uncased',embedding_type='sentence_embedding')
Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity.
Set embedding_type to sentence_embedding
(default embedding_type), To compute the similarity score between two sentences based on sbert.
paper link (https://arxiv.org/pdf/1908.10084.pdf)
score=model.get_score(sentence_a,sentence_b,metric="cosine")
print(score)
Available metric are euclidean, manhattan, minkowski, cosine score.