Skip to content

Add Option to Database Initialization Script for Generating Embeddings on Data with No Predefined Vectors #28

@rifkybujana

Description

@rifkybujana

Problem:

Our database initialization script assumes that input data comes with precomputed vectors for each item. However, some users may have data where vectors are not available, and we need to provide a solution to generate embeddings on the fly for such data.

Proposed Solution:

I propose adding a new command-line option to our database initialization script, allowing users to specify data items without predefined vectors. When this option is used, the script should generate embeddings for these items during initialization.

Detailed Requirements:

  • Option Name: --generate-embeddings or -ge
  • Usage Example:
$ ./initialize-database.py --no-vector
  • Embedding Generation Method: The script should use a predefined embedding generation method (e.g., word embeddings, image feature extraction) for items without vectors.
  • Logging: The script should log the embedding generation process to provide transparency and debugging capabilities.
  • Performance Considerations: Ensure that the embedding generation process is efficient to avoid excessive computation time during initialization.

Benefits:

  • Enhanced usability: Users can now initialize the database with data lacking precomputed vectors, expanding the data types our system can handle.
  • Improved flexibility: Our system becomes more versatile, accommodating users with data sources that don't provide vector representations.
  • Reduced manual work: Users won't need to manually precompute vectors for data items, simplifying the data integration process.

Related Issues:

  • None

Assignees:

  • Unassigned

Labels:

  • Enhancement

Milestone:

  • To be determined

Additional Information:

  • The database initialization script is located in the scripts directory.
  • It's important to document this new feature in the project's README or documentation to ensure users are aware of how to use it.
  • Consider discussing this feature with the team to gather input and reach a consensus on its implementation.
  • This enhancement will make our database initialization script more versatile and accommodating to a broader range of user data, aligning with our goal of providing a seamless experience for our users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions