SoraDB is a custom-built vector storage engine designed to manage and query high-dimensional vector data. It provides core functionality for storing vectors, computing cosine similarity, and performing efficient searches based on similarity. SoraDB is ideal for use cases like recommendation systems, search engines, and AI-powered applications that require vector-based data retrieval.
- Uses
std::unordered_mapto store vectors by their unique string IDs.
- Computes similarity between vectors using cosine similarity for efficient nearest neighbor search.
- Will implement
findTopKto return the top K most similar vectors based on a given query.
- Store vectors using
unordered_map(id β vector) - Implement
cosineSimilarityfunction for comparing vectors - Set up basic
VSE(Vector Storage Engine) class structure - Begin implementing
findTopKfor searching based on cosine similarity
- Load embeddings file into vector
- CLI
- Cache embeddings onload
- Figure out HNSW
- Implement
findTopKto return the top K most similar vectors for a query - Implement
insertfunction for adding vectors with unique IDs - Add batch insert and search functionality for efficiency
- Implement multi-threaded search for faster query results
- Add support for metadata (e.g., tags, timestamps) alongside vectors
- Persist vector storage to disk (JSON or binary file format)
- Create a basic REST API for interfacing with the database (using a C++ framework)
- Build an example project using SoraDB (e.g., AI-powered FAQ search or image similarity search)
- Dockerize the vector database service for easy deployment
- Optimize query performance using Approximate Nearest Neighbor (ANN) techniques
