Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB Atlas Vector Store Support #376

Closed

Conversation

Kirbstomper
Copy link
Contributor

@Kirbstomper Kirbstomper commented Mar 1, 2024

Holding off squashing/documentation further until I get some eyes on it.

This implements a VectorStore on MongoDB Atlas. This does NOT work with MongoDB hosted outside of Atlas as using vector search requires creating a search index through Atlas.
More on how that works under the hood here: https://www.mongodb.com/docs/atlas/atlas-search/atlas-search-overview/

As for configuration, I figured we could provide some defaults, but ideally allow users to set the following themselves as much of it is reliant on whatever they have setup in Atlas.

path: The field you are using for your index in the Atlas Vector Search Index
vector_index: The name of the vector search index
vector_collection_name : The name of the collection your index was created on
num_candidates : Number of nearest neighbors to use during the search. Value must be less than or equal to (<=) 10000. You can't specify a number less than the number of documents to return (limit).
metadataFieldsToFilter: The metadata fields you want to be able to filter with.

We query the database using an aggregation search and preform a post-filter in the pipeline to filter out anything below the threshold value.
https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#mongodb-pipeline-pipe.-vectorSearch

One thing I have noticed (and why I have a terrible sleep(5000) in the test for now....) is that the indexing does not happen instantly.

Kirbstomper and others added 19 commits November 4, 2023 11:47
This aggregation is used to actually preform the search on a given collection with embeddings
Integration test runs fine given...
- You have a mongo atlas cluster to connect to (local or remote)
- You have the search index "spring_ai_vector_search" setup correctly
- Need to explore getting around this
Integration test runs fine given...
- You have a mongo atlas cluster to connect to (local or remote)
- You have the search index "spring_ai_vector_search" setup correctly
- Need to explore getting around this
- Need to filter results using threshold
Integration test runs fine given...
- You have a mongo atlas cluster to connect to (local or remote)
- You have the search index "spring_ai_vector_search" setup correctly
- Need to explore getting around this
- Need to filter results using threshold
Integration test runs fine given...
- You have a mongo atlas cluster to connect to (local or remote)
- You have the search index "spring_ai_vector_search" setup correctly
- Need to explore getting around this
- Need to filter results using threshold
While a post filter is not ideal, it gets the job done. The mongo team seems to be working on having it availible as a prefilter option, in which this implementation can be updated to use later
@tzolov
Copy link
Contributor

tzolov commented Mar 1, 2024

This is great! thanks for contributing @Kirbstomper !

@tzolov tzolov added vector store enhancement New feature or request labels Mar 1, 2024
@tzolov tzolov added this to the 1.0.0-M1 milestone Mar 2, 2024
@Kirbstomper
Copy link
Contributor Author

Kirbstomper commented Mar 2, 2024

Added some more documentation and a builder for configuration

I'm thinking of renaming the module to vector-stores/mongodb-atlas to make it more obvious that this is not just for run of the mill mongo db, it will also differentiate it from if we ever create a VectorStore for something like Azure Cosmo DB for Mongo as that has some vector storage support too.

@tzolov
Copy link
Contributor

tzolov commented Mar 2, 2024

To be consistent with the other store names you should use the spring-ai- prefix. E.g. vector-stores/spring-ai-mongodb-atlas.

I've noticed that MongoDB Atlas provides filtering support. Perhaps we can map our metadata filtering to it.
We have done it for most of the other stores. If you have time (and interest) you can investigate this. But it is not critical. We can leave it of later improvements.
I'm busy with other stuff until our 0.8.1 milestone release (coming week). But will try to review your contribution for next milestone (1.0.0-M1).

@Kirbstomper
Copy link
Contributor Author

@tzolov
Implemented filtering, and creation of the search index if it doesn't exist. The search index will be created using the configured collection name, metadataFieldsToFilter, pathName, and vectorIndexName. If the search index does exist, then from what I can tell the operation just doesn't do anything.

Ideally I want to be able to check if this search index already exist and then update the index instead, but the updateSearchIndex operation doesn't seem to work very well with vector search index definitions.

@tzolov
Copy link
Contributor

tzolov commented Mar 16, 2024

Hey @Kirbstomper , thanks for your contribution. Good stuff.

I've reviewed and made some small fixes before merging it.
Here are the left overs if you are interested to continue with them.

@tzolov
Copy link
Contributor

tzolov commented Mar 16, 2024

Rebased, squashed and merged at 5f0123c

Follow up actions:
#455
#456

@tzolov tzolov closed this Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vector store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants