Add multilingual NanoBEIREvaluator and SparseNanoBEIREvaluator support #3555

marquesafonso · 2025-10-25T12:27:17Z

This pull request adds multilingual NanoBEIREvaluator and SparseNanoBEIREvaluator support for: Arabic (ar), German (de), English (en), Spanish (es), French (fr), Italian (it), Norwegian (no), Portuguese (pt), and Swedish (sv). It preserves English as default, adds a validator for the added language argument and a test for invalid languages.

The solution is based on the lightonai/nanobeir-multilingual dataset, which is based on the original NanoBEIR collection.

The code style was kept similar to that of the existing NanoBEIREvaluator.py file:

Added a LanguageType Literal for validation + _validate_language method.
Renamed dataset_name_to_id to dataset_name_to_subset_id (as the multilingual dataset is only one dataset with multiple subsets).
Added new language argument + descriptive annotation.
Changed the _load_dataset method to properly load the corpus, queries and qrels for each dataset/subset.
Added a test_nanobeir_evaluator_invalid_language test in the tests/evaluation.
Added the new language argument + descriptive annotation to SparseNanoBEIREvaluator to reflect the changes made in NanoBEIREvaluator.

All tests in the evaluation folder are passing (with the exception of one skip)

Hope you find this PR helpful and can merge it for multilingual support of NanoBEIREvaluator and SparseNanoBEIREvaluator, this would be helpful to test/benchmark retrievers quickly in other languages!

Available for any comments/improvements.

Best regards,
Afonso

marquesafonso added 2 commits October 25, 2025 14:11

Add multilingual NanoBEIREvaluator support

d23ee8f

Add multilingual SparseNanoBEIREvaluator support

309a3e7

marquesafonso changed the title ~~Add multilingual NanoBEIREvaluator support~~ Add multilingual NanoBEIREvaluator and SparseNanoBEIREvaluator support Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multilingual NanoBEIREvaluator and SparseNanoBEIREvaluator support #3555

Add multilingual NanoBEIREvaluator and SparseNanoBEIREvaluator support #3555

marquesafonso commented Oct 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add multilingual NanoBEIREvaluator and SparseNanoBEIREvaluator support #3555

Are you sure you want to change the base?

Add multilingual NanoBEIREvaluator and SparseNanoBEIREvaluator support #3555

Conversation

marquesafonso commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marquesafonso commented Oct 25, 2025 •

edited

Loading