vertexai: Add `embeddings_task_type` parameter to `embed_query` and `embed_documents` #716

mitraan-deshaw · 2025-01-27T23:03:34Z

PR Description

This PR introduces an optional embeddings_task_type parameter to embed_documents and embed_query in VertexAIEmbeddings, mirroring the task_type argument in GoogleGenerativeAIEmbeddings. While working on this, I also addressed a couple of additional improvements:

Created EmbeddingTaskTypes to consolidate embedding task types, reducing redundancy. I've currently added it as a Literal type but I'm open to suggestions for a more suitable location.
Fixed an issue where the text-embedding-005 model, which supports embedding tasks, was incorrectly classified as an older model, preventing proper argument passing. I've added a new GoogleEmbeddingModelVersion to address this.
Added (new) CODE_RETRIEVAL_QUERY as an embedding task.

Sample code:

from langchain_google_vertexai import VertexAIEmbeddings

# Initialize the a specific Embeddings Model version
embeddings = VertexAIEmbeddings(model_name="text-embedding-005")
single_vector = embeddings.embed_query(text, embeddings_task_type="CODE_RETRIEVAL_QUERY")

Relevant issues

Type

🆕 New Feature

Changes(optional)

Added optional embeddings_task_type parameter to embed_documents and embed_query in VertexAIEmbeddings.
Consolidated embeddings task types into EmbeddingTaskTypes to reduce redundancy.
Added support for text-embedding-005 embedding tasks.
Added CODE_RETRIEVAL_QUERY as an embedding task.

Testing(optional)

Note(optional)

lkuligin · 2025-01-28T04:24:05Z

libs/vertexai/langchain_google_vertexai/embeddings.py

+        self,
+        texts: List[str],
+        batch_size: int = 0,
+        embeddings_task_type: EmbeddingTaskTypes = "RETRIEVAL_DOCUMENT",


please, add a *, so that embeddings_task_type can be provided by name only

lkuligin · 2025-01-28T04:27:11Z

libs/vertexai/tests/integration_tests/test_embeddings.py

+    "model_name, embeddings_dim",
+    _EMBEDDING_MODELS,
+)
+def test_langchain_google_vertexai_embedding_query_with_task_type(


do we need this integration test, or would a unit test be enough (we've tested in a test above that embeddings_task is passed to the Google API and it returns a valid output)

we've got too many integration tests now and the execution time gets longer and longer :), it might be a good idea to keep the total amount of them reasonable

Totally makes sense! However, I'm not sure how we would write a clean unit test for the new argument in embed_documents() and embed_query(), as these methods in turn call embed() and return its response. It almost seems like we would be validating the embed() method instead.

If we only want to check if the arguments are passed correctly, one approach could be to mock the embed() method, capture the arguments, and then call the VertexAIEmbedding's embed() method. Happy to remove the tests if you think it's not required.

just by mocking the API:

langchain-google/libs/vertexai/tests/unit_tests/test_embeddings.py

Line 20 in c3d4061

def test_langchain_google_vertexai_no_dups_dynamic_batch_size() -> None:

Moved the tests into unit tests.

lkuligin · 2025-01-31T14:38:58Z

@mitraan-deshaw could you fix the linter, please?

Anshuman Mitra added 5 commits January 27, 2025 16:52

feat: Added embeddings_task_type argument

f3c1809

feat: Added CODE_RETRIEVAL_QUERY embedding task

398aa03

chore: consolidated embedding task types

ddeb681

feat: Added support for text-embedding-005

30762b8

fix: Remove CODE_RETRIEVAL_QUERY from test

366c5f4

lkuligin requested changes Jan 28, 2025

View reviewed changes

Anshuman Mitra added 2 commits January 28, 2025 11:10

fix: Enforce as a keyword-only arg

e9fa377

fix: Moved tests into unit_tests

9caf9ab

lkuligin approved these changes Jan 30, 2025

View reviewed changes

chore: lint

3b4b7d4

mitraan-deshaw requested a review from lkuligin January 31, 2025 15:33

lkuligin approved these changes Jan 31, 2025

View reviewed changes

lkuligin merged commit d64bd7d into langchain-ai:main Feb 1, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vertexai: Add `embeddings_task_type` parameter to `embed_query` and `embed_documents` #716

vertexai: Add `embeddings_task_type` parameter to `embed_query` and `embed_documents` #716

Uh oh!

mitraan-deshaw commented Jan 27, 2025 •

edited

Loading

Uh oh!

lkuligin Jan 28, 2025

Uh oh!

mitraan-deshaw Jan 29, 2025

Uh oh!

lkuligin Jan 28, 2025

Uh oh!

mitraan-deshaw Jan 28, 2025

Uh oh!

lkuligin Jan 29, 2025

Uh oh!

mitraan-deshaw Jan 29, 2025

Uh oh!

lkuligin commented Jan 31, 2025

Uh oh!

Uh oh!

Uh oh!

vertexai: Add embeddings_task_type parameter to embed_query and embed_documents #716

vertexai: Add embeddings_task_type parameter to embed_query and embed_documents #716

Uh oh!

Conversation

mitraan-deshaw commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Relevant issues

Type

Changes(optional)

Testing(optional)

Note(optional)

Uh oh!

lkuligin Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

mitraan-deshaw Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

lkuligin Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

mitraan-deshaw Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

lkuligin Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

mitraan-deshaw Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

lkuligin commented Jan 31, 2025

Uh oh!

Uh oh!

Uh oh!

vertexai: Add `embeddings_task_type` parameter to `embed_query` and `embed_documents` #716

vertexai: Add `embeddings_task_type` parameter to `embed_query` and `embed_documents` #716

mitraan-deshaw commented Jan 27, 2025 •

edited

Loading