Skip to content

Conversation

@sw241395
Copy link

@sw241395 sw241395 commented Aug 4, 2025

In some of the eval methods, you can't pass args to the encode method.
Therefore I made some edits too the eval code to enable it.

This will be useful for some think like the jinaai/jina-embeddings-v3 model where you pass in the LoRA adapter you want to use.

Let me know if you want me to change anything.

@sw241395
Copy link
Author

It seems like the tests are failing due to:

 WARNING  huggingface_hub.utils._http:_http.py:315 HTTP Error 429 thrown while requesting HEAD https://huggingface.co/sentence-transformers/average_word_embeddings_levy_dependency/resolve/main/0_WordEmbeddings/model.safetensors

Too many requests I suspect from the github/containers IP. I have run them locally and the seems to be ok.

A temporary work around could be to include the model in the tests folderstructure, then during the set up of the tests copy them into the appropriate .cache dir. But i understand this is quite hacky and not great practice.

@tomaarsen
Copy link
Member

Hello!

Thank you for opening this. Indeed, the test failures are due to rate limits from the various GitHub actions runners. It's unrelated to this PR.
I'm still considering how to best approach your proposal. I think there's two other considerations:

  1. should additional kwargs be passed to __call__, or perhaps via the evaluator initialization instead?
  2. For the InformationRetrievalEvaluator, you might want different parameters for the queries than for the documents. This might mean adding two parameters instead?

Or perhaps we recognize that if a model is custom, then we don't necessarily have 100% compatibility with the rest of the library, and users of those custom models are expected to make the required changes on their side?
It's a bit tricky, I think.

  • Tom Aarsen

@sw241395
Copy link
Author

sw241395 commented Sep 7, 2025

Hello!

Thank you for opening this. Indeed, the test failures are due to rate limits from the various GitHub actions runners. It's unrelated to this PR. I'm still considering how to best approach your proposal. I think there's two other considerations:

1. should additional kwargs be passed to `__call__`, or perhaps via the evaluator initialization instead?

2. For the [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator), you might want different parameters for the queries than for the documents. This might mean adding two parameters instead?

Or perhaps we recognize that if a model is custom, then we don't necessarily have 100% compatibility with the rest of the library, and users of those custom models are expected to make the required changes on their side? It's a bit tricky, I think.

* Tom Aarsen

Hey Tom

Yeah I agree that custom models wont have 100% compatibility to the library.

I am happy to move the args to the init if you feel this is more appropriate. Ive seen in some of the inits they already have args that are used in the call so kinda makes sense too be consistent in that manor.

E.G https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/MSEEvaluator.py#L77

Unless you prefer to leave it in the call method then I can go back and fix the InformationRetievalEvaluator.

Thanks
SW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants