-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize documents embedding in the add method from the MilvusVectortore #1140
Optimize documents embedding in the add method from the MilvusVectortore #1140
Conversation
@@ -69,6 +69,8 @@ public class MilvusVectorStore implements VectorStore, InitializingBean { | |||
|
|||
public static final int INVALID_EMBEDDING_DIMENSION = -1; | |||
|
|||
private static final int MAX_EMBEDDING_ARRAY_DIMENSIONS = 2048; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this limit? Milvus documentation suggests much higher limits for dimensionality - https://milvus.io/docs/limitations.md#Dimensions-of-a-vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I added this dimensions limit is that when using OpenAiEmbeddingModel implementation of the EmbeddingModel is an OpenAi API limit:
https://platform.openai.com/docs/api-reference/embeddings/create
I attached the link for the first version of the API, but there is the same situation for the second one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed MAX_EMBEDDING_ARRAY_DIMENSIONS
to MAX_OPENAI_EMBEDDING_ARRAY_DIMENSIONS
for greater clarity. Also, since in this class there is an OPENAI_EMBEDDING_DIMENSION_SIZE
variable, this change helps maintain consistency in variable naming style.
@@ -69,6 +69,8 @@ public class MilvusVectorStore implements VectorStore, InitializingBean { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add your name as an author to the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
a5c9f07
to
9afd59f
Compare
f49e257
to
4a9be1f
Compare
4a9be1f
to
15359aa
Compare
@solenyk Based on the ideas in this PR, we decided to make some API changes that are overarching to how we embed documents and then use those embeddings in other vector stores. At the core of those changes is a new These changes were included in the recently released Thank you for the PR; it certainly geared us toward providing it as a general pattern that other vector stores can use. I am closing the PR now. More PR contributions are welcomed! |
Today I was using Spring AI and ran into a problem with org.springframework.ai.vectorstore.MilvusVectorStore. In method add(List documents), when going through each document separately, a request embeddingModel.embed(document); is sent for each of them. I have 4,000 documents in my case, so the method takes a very long time (50 minutes).
To speed up the process, I extended from MilvusVectorStore and already after executing the loop over the documents I used the List<List> embed(List texts) method from org.springframework.ai.embedding.EmbeddingModel with contentArray, which significantly accelerated execution (now performance lasts 1 minute).