MiniRankBrain is a semantic search engine designed to find files or directories that closely resemble a user's query. It leverages machine learning models to convert the query into an embedding vector, which is then compared against stored vectors representing known files or directories. The search process involves retrieving and re-ranking results to present the most relevant matches to the user. MiniRankBrain exposes two main routes for search:
-
/search/files
: This route allows users to search for files that best match the provided query. -
/search/directory
: This route enables users to search for directories that closely align with the search query.
It calculates the query's embedding vector and fetches directories with comparable embeddings to deliver the top results.
These search functionalities are supported by embedding techniques and efficient indexing, ensuring accurate and fast retrieval of relevant items.
Reason why and how is explained in article right here: https://medium.com/@bitr13x
- User Query
- → Embedding Vector
- → Compare with known query-doc pairs (semantic match)
- → Retrieve & Re-rank top results
- → Final result list for user
A semantic query expansion + matching + ranking system Similar to: dense retrieval (like DPR, ColBERT, or BGE + reranker setups) But trained end-to-end on query + click logs, not Q&A
Feature | IndexFlatIP | IndexHNSWFlat |
---|---|---|
Search Type | Exact | Approximate (graph-based) |
Similarity | Inner product | Inner product (configurable) |
Memory | Low (O(N×d)) | Higher (O(N×d + N×M)) |
Speed (large N) | Slow | Very fast |
Accuracy | 100% | Tunable, typically 90–99% |
Best Use Case | Small/medium datasets | Large-scale real-time search |