Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add Typesense vector store adapter #728

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
bb7bee9
feat: typesense vectorstore setup
PabloSanchi Apr 6, 2024
6e993a9
feat: typesense autoconfigure setup
PabloSanchi Apr 6, 2024
bc79d68
feat: add post bean initialization and create method
PabloSanchi Apr 6, 2024
3561287
feat: add embedding field
PabloSanchi Apr 6, 2024
47bea8c
fix: change datastore name, typo in code reuse
PabloSanchi Apr 6, 2024
f32be95
WIP: implement search
PabloSanchi Apr 7, 2024
eed6bfe
feat: add transformers embedding dependency
PabloSanchi Apr 9, 2024
2604ff3
fix: create collection add nested field options
PabloSanchi Apr 9, 2024
d039612
feat: add typesense tests | WIP
PabloSanchi Apr 9, 2024
180a2d9
feat: add temporary directory as typesense need it
PabloSanchi Apr 9, 2024
f85d6b7
Merge branch 'spring-projects:main' into feat/typesense-vectorstore
PabloSanchi Apr 19, 2024
b9b5cb0
Merge branch 'spring-projects:main' into feat/typesense-vectorstore
PabloSanchi May 5, 2024
50cad2a
fix: use embedding variable instead of word vec
PabloSanchi May 5, 2024
9432dc5
feat: check in runtime the number of documents in the collection
PabloSanchi May 5, 2024
d79b893
feat: add typesense expression converter
PabloSanchi May 11, 2024
7ae6982
feat: add expression converter
PabloSanchi May 11, 2024
65d79c7
feat: add filter tests
PabloSanchi May 11, 2024
0dd7074
feat: add update document test and search with threshold test
PabloSanchi May 11, 2024
eeafe03
Merge branch 'spring-projects:main' into feat/typesense-vectorstore
PabloSanchi May 11, 2024
bbf3805
fix: apply linter
PabloSanchi May 12, 2024
f6a6733
fix: add distance assert
PabloSanchi May 12, 2024
a240c6e
fix: distance threshold and add distance key into metadata
PabloSanchi May 12, 2024
3e1b1bb
feat: add typesesne starter
PabloSanchi May 13, 2024
4d25b68
feat: add starter and autoconfigure imports
PabloSanchi May 13, 2024
8a32dd0
fix: add signature and fix typos
PabloSanchi May 13, 2024
43aed32
fix: use 'in' operator in tests
PabloSanchi May 13, 2024
322900b
feat: add typesense docs
PabloSanchi May 13, 2024
5e893eb
feat: add client properties in autoconfigure
PabloSanchi May 16, 2024
60a5d40
feat:add typesense dependency
PabloSanchi May 16, 2024
a523d83
feat: add embedding dimension method
PabloSanchi May 16, 2024
13348c9
fix: remove unused import
PabloSanchi May 16, 2024
203949d
fix: change default collection name
PabloSanchi May 16, 2024
0fc122b
feat: update docs
PabloSanchi May 16, 2024
ba41f40
fix: error with service client bean, missing configuration annotation
PabloSanchi May 16, 2024
9a43d92
Merge branch 'spring-projects:main' into feat/typesense-vectorstore
PabloSanchi May 16, 2024
05200f8
feat: add typesense vector store autoconfiguration tests
PabloSanchi May 21, 2024
d105247
fix: remove unused imports
PabloSanchi May 21, 2024
7181fa7
feat: renaming typesense to typesense-store
PabloSanchi May 21, 2024
6ddb058
Merge remote-tracking branch 'upstream/main' into backup-feat/typesen…
PabloSanchi May 22, 2024
c146ee4
fix: refactor client to model
PabloSanchi May 22, 2024
f13a572
Merge pull request #4 from PabloSanchi/backup-feat/typesense-store
PabloSanchi May 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@
<module>vector-stores/spring-ai-qdrant-store</module>
<module>vector-stores/spring-ai-redis-store</module>
<module>vector-stores/spring-ai-weaviate-store</module>

<module>spring-ai-spring-boot-starters/spring-ai-starter-azure-store</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-cassandra-store</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-chroma-store</module>
Expand All @@ -52,7 +51,6 @@
<module>spring-ai-spring-boot-starters/spring-ai-starter-qdrant-store</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-redis-store</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-weaviate-store</module>

<module>models/spring-ai-anthropic</module>
<module>models/spring-ai-azure-openai</module>
<module>models/spring-ai-bedrock</module>
Expand All @@ -68,7 +66,7 @@
<module>models/spring-ai-vertex-ai-palm2</module>
<module>models/spring-ai-watsonx-ai</module>
<module>models/spring-ai-zhipuai</module>

<module>vector-stores/spring-ai-typesense-store</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-anthropic</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-azure-openai</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-bedrock-ai</module>
Expand All @@ -83,6 +81,7 @@
<module>spring-ai-spring-boot-starters/spring-ai-starter-vertex-ai-palm2</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-watsonx-ai</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-zhipuai</module>
<module>spring-ai-spring-boot-starters/spring-ai-starter-typesense</module>
</modules>

<organization>
Expand Down Expand Up @@ -160,6 +159,7 @@
<azure-search.version>11.6.1</azure-search.version>
<weaviate-client.version>4.5.1</weaviate-client.version>
<qdrant.version>1.7.1</qdrant.version>
<typesense.version>0.5.0</typesense.version>

<!-- documentation dependencies -->
<io.spring.maven.antora-version>0.0.4</io.spring.maven.antora-version>
Expand Down
13 changes: 13 additions & 0 deletions spring-ai-bom/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,12 @@
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-typesense-store</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-zhipuai</artifactId>
Expand Down Expand Up @@ -379,12 +385,19 @@
<artifactId>spring-ai-mongodb-atlas-store-spring-boot-starter</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-typesense-store-spring-boot-starter</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-spring-boot-testcontainers</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
package org.springframework.ai.vectorstore.filter.converter;

import org.springframework.ai.vectorstore.filter.Filter;

/**
* Converts {@link Filter.Expression} into Typesense metadata filter expression format.
* (https://typesense.org/docs/0.24.0/api/search.html#filter-parameters)
*
* @author Pablo Sanchidrian
*/
public class TypesenseFilterExpressionConverter extends AbstractFilterExpressionConverter {

@Override
protected void doExpression(Filter.Expression exp, StringBuilder context) {
this.convertOperand(exp.left(), context);
context.append(getOperationSymbol(exp));
this.convertOperand(exp.right(), context);
}

private String getOperationSymbol(Filter.Expression exp) {
switch (exp.type()) {
case AND:
return " && ";
case OR:
return " || ";
case EQ:
return " "; // in typesense "EQ" operator looks like -> country:USA
case NE:
return " != ";
case LT:
return " < ";
case LTE:
return " <= ";
case GT:
return " > ";
case GTE:
return " >= ";
case IN:
return " "; // in typesense "IN" operator looks like -> country: [USA, UK]
case NIN:
return " != "; // in typesense "NIN" operator looks like -> country:
// !=[USA, UK]
default:
throw new RuntimeException("Not supported expression type:" + exp.type());
}
}

@Override
protected void doGroup(Filter.Group group, StringBuilder context) {
this.convertOperand(new Filter.Expression(Filter.ExpressionType.AND, group.content(), group.content()),
context); // trick
}

@Override
protected void doKey(Filter.Key key, StringBuilder context) {
context.append("metadata." + key.key() + ":");
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
= Typesense

This section walks you through setting up `TypesenseVectorStore` to store document embeddings and perform similarity searches.

link:https://typesense.org[Typesense] Typesense is an open source, typo tolerant search engine that is optimized for instant sub-50ms searches, while providing an intuitive developer experience.

== Prerequisites

1. A Typesense instance
- link:https://typesense.org/docs/guide/install-typesense.html[Typesense Cloud] (recommended)
- link:https://hub.docker.com/r/typesense/typesense/[Docker] image _typesense/typesense:latest_

2. `EmbeddingClient` instance to compute the document embeddings. Several options are available:
- If required, an API key for the xref:api/embeddings.adoc#available-implementations[EmbeddingClient] to generate the embeddings stored by the `TypesenseVectorStore`.

== Auto-configuration

Spring AI provides Spring Boot auto-configuration for the Typesense Vector Sore.
To enable it, add the following dependency to your project's Maven `pom.xml` file:

[source, xml]
----
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-typesense-spring-boot-starter</artifactId>
</dependency>
----

or to your Gradle `build.gradle` build file.

[source,groovy]
----
dependencies {
implementation 'org.springframework.ai:spring-ai-typesense-spring-boot-starter'
}
----

TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file.

TIP: Refer to the xref:getting-started.adoc#repositories[Repositories] section to add Milestone and/or Snapshot Repositories to your build file.

Additionally, you will need a configured `EmbeddingClient` bean. Refer to the xref:api/embeddings.adoc#available-implementations[EmbeddingClient] section for more information.

Here is an example of the needed bean:

[source,java]
----
@Bean
public EmbeddingClient embeddingClient() {
// Can be any other EmbeddingClient implementation.
return new OpenAiEmbeddingClient(new OpenAiApi(System.getenv("SPRING_AI_OPENAI_API_KEY")));
}
----

To connect to Typesense you need to provide access details for your instance.
A simple configuration can either be provided via Spring Boot's _application.yml_,

[source,yaml]
----
spring:
ai:
vectorstore:
typesense:
collectionName: "vector_store"
embeddingDimension: 1536
client:
protocl: http
host: localhost
port: 8108
apiKey: xyz
----

Please have a look at the list of xref:#_configuration_properties[configuration parameters] for the vector store to learn about the default values and configuration options.

Now you can Auto-wire the Typesense Vector Store in your application and use it

[source,java]
----
@Autowired VectorStore vectorStore;

// ...

List <Document> documents = List.of(
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
new Document("The World is Big and Salvation Lurks Around the Corner"),
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents to Typesense
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));
----

=== Configuration properties

You can use the following properties in your Spring Boot configuration to customize the Typesense vector store.

|===
|Property| Description | Default value

|`spring.ai.vectorstore.typesense.client.protocol`| HTTP Protocol | `http`
|`spring.ai.vectorstore.typesense.client.host`| Hostname | `localhost`
|`spring.ai.vectorstore.typesense.client.port`| Port | `8108`
|`spring.ai.vectorstore.typesense.client.apiKey`| ApiKey | `xyz`
|`spring.ai.vectorstore.typesense.collectionName`| Collection Name | `vector_store`
|`spring.ai.vectorstore.typesense.embeddingDimension`| Embedding Dimension | `1536`

|===

== Metadata filtering

You can leverage the generic, portable link:https://docs.spring.io/spring-ai/reference/api/vectordbs.html#_metadata_filters[metadata filters] with `TypesenseVectorStore` as well.

For example, you can use either the text expression language:

[source,java]
----
vectorStore.similaritySearch(
SearchRequest
.query("The World")
.withTopK(TOP_K)
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
.withFilterExpression("country in ['UK', 'NL'] && year >= 2020"));
----

or programmatically using the expression DSL:

[source,java]
----
FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(
SearchRequest
.query("The World")
.withTopK(TOP_K)
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
.withFilterExpression(b.and(
b.in("country", "UK", "NL"),
b.gte("year", 2020)).build()));
----

The portable filter expressions get automatically converted into link:https://typesense.org/docs/0.24.0/api/search.html#filter-parameters[Typesense Search Filters].
For example, the following portable filter expression:

[source,sql]
----
country in ['UK', 'NL'] && year >= 2020
----

is converted into Typesense filter:

[source]
----
country: ['UK', 'NL'] && year: >=2020
----

== Manual configuration

If you prefer not to use the auto-configuration, you can manually configure the Typesense Vector Store.
Add the Typesense Vector Store and Jedis dependencies

[source,xml]
----
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-typesense</artifactId>
</dependency>
----

TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file.

Then, create a `TypesenseVectorStore` bean in your Spring configuration:

[source,java]
----
@Bean
public VectorStore vectorStore(Client client, EmbeddingClient embeddingClient) {

TypesenseVectorStoreConfig config = TypesenseVectorStoreConfig.builder()
.withCollectionName("test_vector_store")
.withEmbeddingDimension(embeddingClient.dimensions())
.build();

return new TypesenseVectorStore(client, embeddingClient, config);
}

@Bean
public Client typesenseClient() {
List<Node> nodes = new ArrayList<>();
nodes
.add(new Node("http", typesenseContainer.getHost(), typesenseContainer.getMappedPort(8108).toString()));

Configuration configuration = new Configuration(nodes, Duration.ofSeconds(5), "xyz");
return new Client(configuration);
}
----

[NOTE]
====
It is more convenient and preferred to create the `TypesenseVectorStore` as a Bean.
But if you decide to create it manually, then you must call the `TypesenseVectorStore#afterPropertiesSet()` after setting the properties and before using the client.
====


Then in your main code, create some documents:

[source,java]
----
List<Document> documents = List.of(
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("country", "UK", "year", 2020)),
new Document("The World is Big and Salvation Lurks Around the Corner", Map.of()),
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("country", "NL", "year", 2023)));
----

Now add the documents to your vector store:


[source,java]
----
vectorStore.add(documents);
----

And finally, retrieve documents similar to a query:

[source,java]
----
List<Document> results = vectorStore.similaritySearch(
SearchRequest
.query("Spring")
.withTopK(5));
----

If all goes well, you should retrieve the document containing the text "Spring AI rocks!!".

[NOTE]
====
If you are not retrieveing the documents in the expected order or the search results are not as expected, check the embedding model you are using.

Embedding models can have a significant impact on the search results (i.e. make sure if your data is in Spanish to use a Spanish or multilingual embedding model).
====

9 changes: 9 additions & 0 deletions spring-ai-spring-boot-autoconfigure/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,15 @@
<optional>true</optional>
</dependency>


<!-- Typesense vector store -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-typesense-store</artifactId>
<version>${project.parent.version}</version>
<optional>true</optional>
</dependency>

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-minimax</artifactId>
Expand Down
Loading