-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vertexai: Add embeddings_task_type
parameter to embed_query
and embed_documents
#716
vertexai: Add embeddings_task_type
parameter to embed_query
and embed_documents
#716
Conversation
self, | ||
texts: List[str], | ||
batch_size: int = 0, | ||
embeddings_task_type: EmbeddingTaskTypes = "RETRIEVAL_DOCUMENT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, add a *,
so that embeddings_task_type
can be provided by name only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
"model_name, embeddings_dim", | ||
_EMBEDDING_MODELS, | ||
) | ||
def test_langchain_google_vertexai_embedding_query_with_task_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this integration test, or would a unit test be enough (we've tested in a test above that embeddings_task is passed to the Google API and it returns a valid output)
we've got too many integration tests now and the execution time gets longer and longer :), it might be a good idea to keep the total amount of them reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally makes sense! However, I'm not sure how we would write a clean unit test for the new argument in embed_documents()
and embed_query()
, as these methods in turn call embed()
and return its response. It almost seems like we would be validating the embed()
method instead.
If we only want to check if the arguments are passed correctly, one approach could be to mock the embed()
method, capture the arguments, and then call the VertexAIEmbedding
's embed()
method. Happy to remove the tests if you think it's not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just by mocking the API:
def test_langchain_google_vertexai_no_dups_dynamic_batch_size() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the tests into unit tests.
@mitraan-deshaw could you fix the linter, please? |
PR Description
This PR introduces an optional
embeddings_task_type
parameter toembed_documents
andembed_query
inVertexAIEmbeddings
, mirroring thetask_type
argument in GoogleGenerativeAIEmbeddings. While working on this, I also addressed a couple of additional improvements:EmbeddingTaskTypes
to consolidate embedding task types, reducing redundancy. I've currently added it as a Literal type but I'm open to suggestions for a more suitable location.text-embedding-005
model, which supports embedding tasks, was incorrectly classified as an older model, preventing proper argument passing. I've added a newGoogleEmbeddingModelVersion
to address this.CODE_RETRIEVAL_QUERY
as an embedding task.Sample code:
Relevant issues
Type
🆕 New Feature
Changes(optional)
embeddings_task_type
parameter toembed_documents
andembed_query
inVertexAIEmbeddings
.EmbeddingTaskTypes
to reduce redundancy.text-embedding-005
embedding tasks.CODE_RETRIEVAL_QUERY
as an embedding task.Testing(optional)
Note(optional)