Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vertexai: Add embeddings_task_type parameter to embed_query and embed_documents #716

Merged

Conversation

mitraan-deshaw
Copy link
Contributor

@mitraan-deshaw mitraan-deshaw commented Jan 27, 2025

PR Description

This PR introduces an optional embeddings_task_type parameter to embed_documents and embed_query in VertexAIEmbeddings, mirroring the task_type argument in GoogleGenerativeAIEmbeddings. While working on this, I also addressed a couple of additional improvements:

  • Created EmbeddingTaskTypes to consolidate embedding task types, reducing redundancy. I've currently added it as a Literal type but I'm open to suggestions for a more suitable location.
  • Fixed an issue where the text-embedding-005 model, which supports embedding tasks, was incorrectly classified as an older model, preventing proper argument passing. I've added a new GoogleEmbeddingModelVersion to address this.
  • Added (new) CODE_RETRIEVAL_QUERY as an embedding task.

Sample code:

from langchain_google_vertexai import VertexAIEmbeddings

# Initialize the a specific Embeddings Model version
embeddings = VertexAIEmbeddings(model_name="text-embedding-005")
single_vector = embeddings.embed_query(text, embeddings_task_type="CODE_RETRIEVAL_QUERY")

Relevant issues

Type

🆕 New Feature

Changes(optional)

  • Added optional embeddings_task_type parameter to embed_documents and embed_query in VertexAIEmbeddings.
  • Consolidated embeddings task types into EmbeddingTaskTypes to reduce redundancy.
  • Added support for text-embedding-005 embedding tasks.
  • Added CODE_RETRIEVAL_QUERY as an embedding task.

Testing(optional)

Note(optional)

self,
texts: List[str],
batch_size: int = 0,
embeddings_task_type: EmbeddingTaskTypes = "RETRIEVAL_DOCUMENT",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, add a *, so that embeddings_task_type can be provided by name only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"model_name, embeddings_dim",
_EMBEDDING_MODELS,
)
def test_langchain_google_vertexai_embedding_query_with_task_type(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this integration test, or would a unit test be enough (we've tested in a test above that embeddings_task is passed to the Google API and it returns a valid output)

we've got too many integration tests now and the execution time gets longer and longer :), it might be a good idea to keep the total amount of them reasonable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally makes sense! However, I'm not sure how we would write a clean unit test for the new argument in embed_documents() and embed_query(), as these methods in turn call embed() and return its response. It almost seems like we would be validating the embed() method instead.

If we only want to check if the arguments are passed correctly, one approach could be to mock the embed() method, capture the arguments, and then call the VertexAIEmbedding's embed() method. Happy to remove the tests if you think it's not required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just by mocking the API:

def test_langchain_google_vertexai_no_dups_dynamic_batch_size() -> None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the tests into unit tests.

@lkuligin
Copy link
Collaborator

@mitraan-deshaw could you fix the linter, please?

@lkuligin lkuligin merged commit d64bd7d into langchain-ai:main Feb 1, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants