Skip to content

Smart Search with GenAI is an AI-powered semantic search tool that delivers context-aware, summarized answers to natural language queries over structured data. It leverages Streamlit, FAISS, and Azure OpenAI for efficient vector search and LLM-based summarization.

Notifications You must be signed in to change notification settings

ajeet214/smart_search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Smart Search with GenAI

Smart Search Logo

An intelligent, LLM-powered semantic search tool for querying structured data using natural language.


πŸ” Overview

Smart Search with GenAI is a semantic search application that combines vector similarity search (FAISS) with Azure OpenAI's powerful LLMs to answer user queries over structured datasets. Users can input natural language questions and receive context-aware, summarized responses based on pre-indexed data.

Built with Streamlit, the UI offers an interactive, user-friendly experience with prompt suggestions, onboarding guidance, and instant query answers.


✨ Features

  • πŸ”Ž Natural language search over structured data (Excel)
  • 🧠 Azure OpenAI embedding + summarization (text-embedding-ada-002 + gpt-4o-mini)
  • ⚑ Fast retrieval using FAISS vector index
  • πŸ” Chunk reranking using cosine similarity for better accuracy
  • πŸ’‘ Suggested prompt buttons to guide user input
  • πŸ‘‹ Onboarding walkthrough for new users
  • πŸ“Š Built-in Streamlit interface (no frontend coding needed)

Solution Architecture:

Solution Architecture


🧱 Tech Stack

Category Tools Used
UI Streamlit + Custom CSS
Embedding Model Azure OpenAI text-embedding-ada-002
Completion Model Azure OpenAI gpt-4o-mini (configurable)
Vector DB FAISS
Reranking Cosine Similarity (Scikit-learn)
Data Format Excel (.xlsx) + Pandas
Configuration Python dotenv

πŸ“¦ Project Structure

smart_search/
β”œβ”€β”€ README.md
β”œβ”€β”€ app.py                         # Streamlit UI
β”œβ”€β”€ pyproject.toml                 # Project metadata & dependencies
β”œβ”€β”€ .python-version                # Python version spec
β”œβ”€β”€ uv.lock                        # Dependency lock file (if committed)
β”œβ”€β”€ .gitignore
β”‚
β”œβ”€β”€ .env                          # [Not committed] Environment variables
β”‚
β”œβ”€β”€ data/                         # Input data
β”‚   β”œβ”€β”€ raw/
β”‚   β”‚   β”œβ”€β”€ people_data_100.xlsx
β”‚   β”‚   β”œβ”€β”€ people_data_500.xlsx
β”‚   β”‚   └── people_data_1000.xlsx
β”‚   β”œβ”€β”€ logo/
β”‚   β”‚   β”œβ”€β”€ smart_search_logo.png
β”‚   β”‚   └── favicon.ico
β”‚   └── processed/
β”‚       └── people_data_1000_with_textchunk.xlsx
β”‚
β”œβ”€β”€ embeddings/                   # Vector storage & metadata
β”‚   β”œβ”€β”€ chunk_embeddings.npy
β”‚   β”œβ”€β”€ faiss_index_people_data.index
β”‚   └── metadata.csv
β”‚
β”œβ”€β”€ prompts/
β”‚   └── prompt_v1.txt             # LLM prompt template
β”‚
β”œβ”€β”€ scripts/                      # One-time/utility scripts
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ embed_and_index.py        # Embedding + indexing pipeline
β”‚   β”œβ”€β”€ preprocess.py             # Excel data transformation
β”‚   β”œβ”€β”€ search.py                 # Basic FAISS query CLI
β”‚   └── search_with_llm.py        # RAG CLI interface
β”‚
β”œβ”€β”€ styles/
β”‚   β”œβ”€β”€ base.css              # Global variables, resets
β”‚   β”œβ”€β”€ theme.css             # Light/Dark theme styling
β”‚   β”œβ”€β”€ buttons.css           # Button design + animation
β”‚   β”œβ”€β”€ loader.css            # Custom loader animation
β”‚   β”œβ”€β”€ animations.css        # Additional animations (e.g., onboarding)
β”‚   └── style.css             # (Optional) Imports all above files if needed
β”‚
β”‚
└── utils/                        # Reusable backend logic
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ answer_generator.py       # Prompt & LLM answer generator
    β”œβ”€β”€ azure_openai_client.py    # Auth wrapper for Azure OpenAI
    β”œβ”€β”€ config.py                 # Env & path configs
    β”œβ”€β”€ embedder.py               # Query embedder
    β”œβ”€β”€ examples.py               # Suggested prompt examples
    β”œβ”€β”€ onboarding.py             # First-time user walkthrough
    β”œβ”€β”€ prompt_loader.py          # Load prompt from file
    β”œβ”€β”€ theme.py                  # # Theme CSS injection
    β”œβ”€β”€ reranker.py               # Cosine-based reranker
    └── search_engine.py          # Semantic + reranked search logic

Data Flow Diagram:

               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚      🧾 Excel Dataset       β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚ Data Preprocessing (Python) β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚   TextChunks created + saved to processed .xlsx     β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚       Azure OpenAI (text-embedding-ada-002)         β”‚
     β”‚         Embed TextChunks β†’ 1536-dim vector          β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚ FAISS Index (in-memory vector DB)                   β”‚
     β”‚ Stores & retrieves similar chunks                   β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚ Streamlit UI                                        β”‚
     β”‚  - Input query β†’ Embed                              β”‚
     β”‚  - Top-K retrieval + reranking                      β”‚
     β”‚  - LLM prompt creation                              β”‚
     β”‚  - Azure OpenAI chat completion (gpt-4o-mini)       β”‚
     β”‚  - Final answer rendered                            β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

βš™οΈ Setup Instructions

  1. Clone the repository

    git clone https://github.com/ajeet214/smart_search.git
    cd smart_search
  2. Install dependencies and setup environment using uv

    • Initialize environment (if not already done):

      uv init

    • Install dependencies (defined in pyproject.toml):

      uv sync

  3. Activate the virtual environment

    • Using uv:

      uv shell

    • Or activate .venv manually::

      source .venv/bin/activate      # Linux/macOS
      .venv\Scripts\activate         # Windows
  4. Configure environment

    Create a .env file at the root and fill the

    • AZURE_OPENAI_API_KEY,
    • AZURE_OPENAI_ENDPOINT , leave all other as it is.
    • You may change the SHOW_ONBOARDING to false if you like to disable the onboarding screen.
    # Azure OpenAI
    AZURE_OPENAI_API_KEY=your_key_here
    AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com
    AZURE_OPENAI_API_VERSION=2023-05-15
    AZURE_OPENAI_DEPLOYMENT=text-embedding-ada-002
    AZURE_OPENAI_COMPLETION_DEPLOYMENT=gpt-4o-mini
    
    # File Paths
    INPUT_FILE=data/processed/people_data_1000_with_textchunk.xlsx
    OUTPUT_INDEX=embeddings/faiss_index_people_data.index
    OUTPUT_METADATA=embeddings/metadata.csv
    CHUNK_EMBEDDINGS_PATH=embeddings/chunk_embeddings.npy
    
    # UI Behavior
    SHOW_ONBOARDING=true

4. Prepare Data

python scripts/preprocess.py

5. Generate Embeddings and Build Index

python scripts/embed_and_index.py

Launch the App

To start the Streamlit web app, run:

uv run streamlit run app.py

or if your virtual environment is active:

streamlit run app.py

Then open your browser and go to:

http://localhost:8501

πŸ’‘ Example Prompts

  • "Who attended AI-related events in Singapore?"
  • "List members of the Data Science team based in Tokyo."
  • "What events did Alice Ly take part in?"
  • "Show the Marketing team members from Vietnam."

πŸ”’ Security Note

This project uses Azure OpenAI via environment variables. Never commit your .env file. Always include .env in .gitignore.



πŸš€ Roadmap / Future Enhancements

  • Support multi-file or live data ingestion
  • Add citation reference to answers
  • Export responses (PDF, Markdown)
  • Chat history tracking
  • Authentication layer for private access

πŸ“„ License

MIT License. Feel free to fork and extend.


πŸ™Œ Acknowledgements


Let me know if you'd like me to help you add badges, deployment instructions (e.g., for Azure App Service or HuggingFace Spaces), or GitHub Actions CI/CD!

Built with ❀️ by Ajeet using Azure OpenAI, Streamlit, and FAISS.

About

Smart Search with GenAI is an AI-powered semantic search tool that delivers context-aware, summarized answers to natural language queries over structured data. It leverages Streamlit, FAISS, and Azure OpenAI for efficient vector search and LLM-based summarization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published