🚀 Semantic Search Engine with Vectorized Databases

📚 Introduction

This project aims to design and implement an indexing system for a semantic search database that efficiently retrieves information based on vector space embeddings. The indexing mechanism focuses on a vector column, ensuring high accuracy and speed even for large datasets (up to 20 million entries).

🌟 What is Semantic Search?

Semantic search is a technology that enables search engines to understand the meaning behind search queries and provide relevant results based on the context and intent of the user.

Unlike traditional keyword-based search methods, semantic search uses natural language processing (NLP) and machine learning to analyze relationships between words, phrases, and concepts.

For example:

Query: "What are the best ways to study effectively?"
Result: Returns tips on studying, time management strategies, and productivity techniques, even if the exact query words are not in the database.

📐 Project Scope

The project implements an indexing system that meets the following requirements:

Data Structure:
- The database contains only two columns:
  - ID: Unique identifier for each row.
  - Embedding: A 70-dimensional vector representing the data.
Indexing:
- Efficiently retrieves the top k most similar rows to the input query vector using cosine similarity.
Scalability: Handles datasets with up to 20 million vectors.
Performance: Responds in a reasonable time for k up to 10.

⚡ Evaluation Criteria

Accuracy (Recall):
- The system must accurately retrieve the top k most similar vectors for a query.
Efficiency:
- Efficient retrieval with reasonable memory usage and response time.
Scalability:
- Handles datasets up to 20 million entries without performance degradation.

📈 Performance Highlights

Benchmarks

Dataset Size	Time (s)	Peak RAM Usage (MB)
1M	1.49	8.50
10M	4.20	22.25
15M	5.59	11.32
20M	6.65	3.04

Constraints

DB Size	Peak RAM Usage (MB)	Time Limit (s)	Min Accepted Score	Max Index Size (MB)
1M	20	3	-5000	50
10M	50	6	-5000	100
15M	50	8	-5000	150
20M	50	10	-5000	200

Contributors

_{Sara Bisheer}

_{Rawan Mostafa}

_{Menna Mohammed}

_{Fatma Ebrahim}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
documentation		documentation
results		results
trials		trials
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
vec_db.py		vec_db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Semantic Search Engine with Vectorized Databases

📚 Introduction

🌟 What is Semantic Search?

📐 Project Scope

⚡ Evaluation Criteria

📈 Performance Highlights

Benchmarks

Constraints

Contributors

About

Releases

Packages

Contributors 4

Languages

RawanMostafa08/Semantic-Search-in-vector-DB

Folders and files

Latest commit

History

Repository files navigation

🚀 Semantic Search Engine with Vectorized Databases

📚 Introduction

🌟 What is Semantic Search?

📐 Project Scope

⚡ Evaluation Criteria

📈 Performance Highlights

Benchmarks

Constraints

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages