-
Notifications
You must be signed in to change notification settings - Fork 0
Pre-load ONNX embedding model inside the JAR #29
Copy link
Copy link
Open
Labels
0.2.xIssues for the 0.2 releaseIssues for the 0.2 releaseenhancementNew feature or requestNew feature or request
Description
We are using Spring AI's TransformersEmbeddingModel (or an equivalent local embedding generator) with a Deep Learning neural network (like all-MiniLM-L6-v2) to turn text into dense vectors (embeddings).
Currently, by default, the Spring AI ONNX integration downloads the Hugging Face model (model.onnx) and its tokenizer (tokenizer.json) at runtime during the first application startup or first embedding generation request. This causes several issues:
- The initial request takes an excessively long time while it downloads the ~90MB+ model.
- The application cannot be deployed in offline or isolated environments without internet access to Hugging Face.
- Startup/execution becomes dependent on an external service.
Proposal
Pre-package the required ONNX model (all-MiniLM-L6-v2) and its configuration files directly inside the application's src/main/resources so that they are bundled in the compiled JAR. Then, configure the TransformersEmbeddingModel to load the resources from the classpath instead of downloading them.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
0.2.xIssues for the 0.2 releaseIssues for the 0.2 releaseenhancementNew feature or requestNew feature or request