Skip to content

Integration with embedding library #71

@donhardman

Description

@donhardman

Proposal:

We should add parsing of the parameters to enable connection with the new library for automatic embeddings.

Currently, I suggest starting with the following specification:

  • fields = "field1, field2": fields we should use as the source for text to generate embeddings
  • model_name = "openai/..." or model_name = "sentence-transformers/..." for local models
  • api_key = "...": for cases where we use a remote API for embeddings (we should be able to alter and change this one)
  • use_gpu = 1|0: for cases where we use a local model and need to use GPU (if available), default is always CPU

Also we should consider to add new config:

  • embedding_cache_path = ...: path to the cache directory where we will store everything in case of local model usage
    It may look like this:
CREATE TABLE test (
    title TEXT,
    image_vector FLOAT_VECTOR 
    KNN_TYPE='hnsw' 
    KNN_DIMS='4' 
    HNSW_SIMILARITY='l2'
    MODEL_NAME = "..."
);

Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

Details
  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
  • OpenAPI YAML updated and issue created to rebuild clients

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions