Skip to content

Pre-load ONNX embedding model inside the JAR #29

@kornelrabczak

Description

@kornelrabczak

We are using Spring AI's TransformersEmbeddingModel (or an equivalent local embedding generator) with a Deep Learning neural network (like all-MiniLM-L6-v2) to turn text into dense vectors (embeddings).

Currently, by default, the Spring AI ONNX integration downloads the Hugging Face model (model.onnx) and its tokenizer (tokenizer.json) at runtime during the first application startup or first embedding generation request. This causes several issues:

  • The initial request takes an excessively long time while it downloads the ~90MB+ model.
  • The application cannot be deployed in offline or isolated environments without internet access to Hugging Face.
  • Startup/execution becomes dependent on an external service.

Proposal

Pre-package the required ONNX model (all-MiniLM-L6-v2) and its configuration files directly inside the application's src/main/resources so that they are bundled in the compiled JAR. Then, configure the TransformersEmbeddingModel to load the resources from the classpath instead of downloading them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    0.2.xIssues for the 0.2 releaseenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions