Pre-load ONNX embedding model inside the JAR

We are using Spring AI's TransformersEmbeddingModel (or an equivalent local embedding generator) with a Deep Learning neural network (like all-MiniLM-L6-v2) to turn text into dense vectors (embeddings).

Currently, by default, the Spring AI ONNX integration downloads the Hugging Face model (model.onnx) and its tokenizer (tokenizer.json) at runtime during the first application startup or first embedding generation request. This causes several issues:
- The initial request takes an excessively long time while it downloads the ~90MB+ model.
- The application cannot be deployed in offline or isolated environments without internet access to Hugging Face.
- Startup/execution becomes dependent on an external service.

### Proposal
Pre-package the required ONNX model (all-MiniLM-L6-v2) and its configuration files directly inside the application's src/main/resources so that they are bundled in the compiled JAR. Then, configure the TransformersEmbeddingModel to load the resources from the classpath instead of downloading them.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-load ONNX embedding model inside the JAR #29

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pre-load ONNX embedding model inside the JAR #29

Description

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions