Skip to content

RKirlew/SoraDB-A-Lightweight-Vector-Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SoraDB - A Lightweight Vector Database

SoraDB Image

SoraDB is a custom-built vector storage engine designed to manage and query high-dimensional vector data. It provides core functionality for storing vectors, computing cosine similarity, and performing efficient searches based on similarity. SoraDB is ideal for use cases like recommendation systems, search engines, and AI-powered applications that require vector-based data retrieval.

Project Structure

πŸ—‚οΈ Vector Storage

  • Uses std::unordered_map to store vectors by their unique string IDs.

πŸ” Cosine Similarity

  • Computes similarity between vectors using cosine similarity for efficient nearest neighbor search.

πŸ† Top K Search

  • Will implement findTopK to return the top K most similar vectors based on a given query.

To-Do List

βœ… Completed:

  • Store vectors using unordered_map (id β†’ vector)
  • Implement cosineSimilarity function for comparing vectors
  • Set up basic VSE (Vector Storage Engine) class structure
  • Begin implementing findTopK for searching based on cosine similarity

⏳ In Progress / To Come:

  • Load embeddings file into vector
  • CLI
  • Cache embeddings onload
  • Figure out HNSW
  • Implement findTopK to return the top K most similar vectors for a query
  • Implement insert function for adding vectors with unique IDs
  • Add batch insert and search functionality for efficiency
  • Implement multi-threaded search for faster query results
  • Add support for metadata (e.g., tags, timestamps) alongside vectors
  • Persist vector storage to disk (JSON or binary file format)
  • Create a basic REST API for interfacing with the database (using a C++ framework)
  • Build an example project using SoraDB (e.g., AI-powered FAQ search or image similarity search)
  • Dockerize the vector database service for easy deployment
  • Optimize query performance using Approximate Nearest Neighbor (ANN) techniques

About

SoraDB is a custom-built vector storage engine designed to manage and query high-dimensional vector data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages