V-MTEB is devloped based on MTEB.
Clone this repo and install as editable
git clone https://github.com/Iambestfeed/V-MTEB.git
cd V-MTEB
pip install -e .
python eval_cross_encoder.py --model_name_or_path BAAI/bge-reranker-base
-
With scripts Scripts will be updated soon.
-
With sentence-transformers
You can use V-MTEB easily in the same way as MTEB.
from mteb import MTEB
from V_MTEB import *
from sentence_transformers import SentenceTransformer
# Define the sentence-transformers model name
model_name = "fill-your-model-name"
model = SentenceTransformer(model_name)
evaluation = MTEB(task_langs=['vie'])
results = evaluation.run(model, output_folder=f"vi_results/{model_name}")
- Using a custom model
To evaluate a new model, you can load it via sentence_transformers if it is supported by sentence_transformers. Otherwise, models should be implemented like below (implementing anencode
function taking as input a list of sentences, and returning a list of embeddings (embeddings can benp.array
,torch.tensor
, etc.).):
class MyModel():
def encode(self, sentences, batch_size=32, **kwargs):
""" Returns a list of embeddings for the given sentences.
Args:
sentences (`List[str]`): List of sentences to encode
batch_size (`int`): Batch size for the encoding
Returns:
`List[np.ndarray]` or `List[tensor]`: List of embeddings for the given sentences
"""
pass
model = MyModel()
evaluation = MTEB(tasks=["Vietnamese_Student_Topic"])
evaluation.run(model)
Will be updated soon.
An overview of tasks and datasets available in MTEB-chinese is provided in the following table:
Name | Hub URL | Description | Type | Category | Test #Samples |
---|
We thank the great tool from Massive Text Embedding Benchmark and the open-source datasets from Vietnam NLP community.
If you find this repository useful, please consider citation this repo.