An intelligent, LLM-powered semantic search tool for querying structured data using natural language.
Smart Search with GenAI is a semantic search application that combines vector similarity search (FAISS) with Azure OpenAI's powerful LLMs to answer user queries over structured datasets. Users can input natural language questions and receive context-aware, summarized responses based on pre-indexed data.
Built with Streamlit, the UI offers an interactive, user-friendly experience with prompt suggestions, onboarding guidance, and instant query answers.
- π Natural language search over structured data (Excel)
- π§ Azure OpenAI embedding + summarization (text-embedding-ada-002 + gpt-4o-mini)
- β‘ Fast retrieval using FAISS vector index
- π Chunk reranking using cosine similarity for better accuracy
- π‘ Suggested prompt buttons to guide user input
- π Onboarding walkthrough for new users
- π Built-in Streamlit interface (no frontend coding needed)
Category | Tools Used |
---|---|
UI | Streamlit + Custom CSS |
Embedding Model | Azure OpenAI text-embedding-ada-002 |
Completion Model | Azure OpenAI gpt-4o-mini (configurable) |
Vector DB | FAISS |
Reranking | Cosine Similarity (Scikit-learn) |
Data Format | Excel (.xlsx ) + Pandas |
Configuration | Python dotenv |
smart_search/
βββ README.md
βββ app.py # Streamlit UI
βββ pyproject.toml # Project metadata & dependencies
βββ .python-version # Python version spec
βββ uv.lock # Dependency lock file (if committed)
βββ .gitignore
β
βββ .env # [Not committed] Environment variables
β
βββ data/ # Input data
β βββ raw/
β β βββ people_data_100.xlsx
β β βββ people_data_500.xlsx
β β βββ people_data_1000.xlsx
β βββ logo/
β β βββ smart_search_logo.png
β β βββ favicon.ico
β βββ processed/
β βββ people_data_1000_with_textchunk.xlsx
β
βββ embeddings/ # Vector storage & metadata
β βββ chunk_embeddings.npy
β βββ faiss_index_people_data.index
β βββ metadata.csv
β
βββ prompts/
β βββ prompt_v1.txt # LLM prompt template
β
βββ scripts/ # One-time/utility scripts
β βββ __init__.py
β βββ embed_and_index.py # Embedding + indexing pipeline
β βββ preprocess.py # Excel data transformation
β βββ search.py # Basic FAISS query CLI
β βββ search_with_llm.py # RAG CLI interface
β
βββ styles/
β βββ base.css # Global variables, resets
β βββ theme.css # Light/Dark theme styling
β βββ buttons.css # Button design + animation
β βββ loader.css # Custom loader animation
β βββ animations.css # Additional animations (e.g., onboarding)
β βββ style.css # (Optional) Imports all above files if needed
β
β
βββ utils/ # Reusable backend logic
βββ __init__.py
βββ answer_generator.py # Prompt & LLM answer generator
βββ azure_openai_client.py # Auth wrapper for Azure OpenAI
βββ config.py # Env & path configs
βββ embedder.py # Query embedder
βββ examples.py # Suggested prompt examples
βββ onboarding.py # First-time user walkthrough
βββ prompt_loader.py # Load prompt from file
βββ theme.py # # Theme CSS injection
βββ reranker.py # Cosine-based reranker
βββ search_engine.py # Semantic + reranked search logic
βββββββββββββββββββββββββββββββ
β π§Ύ Excel Dataset β
ββββββββββββββββ¬βββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Data Preprocessing (Python) β
ββββββββββββββββ¬βββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TextChunks created + saved to processed .xlsx β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Azure OpenAI (text-embedding-ada-002) β
β Embed TextChunks β 1536-dim vector β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FAISS Index (in-memory vector DB) β
β Stores & retrieves similar chunks β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit UI β
β - Input query β Embed β
β - Top-K retrieval + reranking β
β - LLM prompt creation β
β - Azure OpenAI chat completion (gpt-4o-mini) β
β - Final answer rendered β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-
Clone the repository
git clone https://github.com/ajeet214/smart_search.git cd smart_search
-
Install dependencies and setup environment using
uv
-
Initialize environment (if not already done):
uv init
-
Install dependencies (defined in pyproject.toml):
uv sync
-
-
Activate the virtual environment
-
Using
uv
:uv shell
-
Or activate .venv manually::
source .venv/bin/activate # Linux/macOS .venv\Scripts\activate # Windows
-
-
Configure environment
Create a
.env
file at the root and fill theAZURE_OPENAI_API_KEY
,AZURE_OPENAI_ENDPOINT
, leave all other as it is.- You may change the
SHOW_ONBOARDING
tofalse
if you like to disable the onboarding screen.
# Azure OpenAI AZURE_OPENAI_API_KEY=your_key_here AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com AZURE_OPENAI_API_VERSION=2023-05-15 AZURE_OPENAI_DEPLOYMENT=text-embedding-ada-002 AZURE_OPENAI_COMPLETION_DEPLOYMENT=gpt-4o-mini # File Paths INPUT_FILE=data/processed/people_data_1000_with_textchunk.xlsx OUTPUT_INDEX=embeddings/faiss_index_people_data.index OUTPUT_METADATA=embeddings/metadata.csv CHUNK_EMBEDDINGS_PATH=embeddings/chunk_embeddings.npy # UI Behavior SHOW_ONBOARDING=true
python scripts/preprocess.py
python scripts/embed_and_index.py
To start the Streamlit web app, run:
uv run streamlit run app.py
or if your virtual environment is active:
streamlit run app.py
Then open your browser and go to:
http://localhost:8501
- "Who attended AI-related events in Singapore?"
- "List members of the Data Science team based in Tokyo."
- "What events did Alice Ly take part in?"
- "Show the Marketing team members from Vietnam."
This project uses Azure OpenAI via environment variables. Never commit your .env
file. Always include .env
in .gitignore
.
- Support multi-file or live data ingestion
- Add citation reference to answers
- Export responses (PDF, Markdown)
- Chat history tracking
- Authentication layer for private access
MIT License. Feel free to fork and extend.
Let me know if you'd like me to help you add badges, deployment instructions (e.g., for Azure App Service or HuggingFace Spaces), or GitHub Actions CI/CD!
Built with β€οΈ by Ajeet using Azure OpenAI, Streamlit, and FAISS.