DeepSearch AI is a fully local, full-stack conversational agent capable of real-time web search. It runs a quantized Large Language Model (LLM) directly on your machine using Docker, ensuring privacy and control without relying on external cloud APIs for the core logic.
0109.mp4
This project is a production-ready Local AI Agent designed to bridge the gap between offline LLMs and real-time internet data. Built with FastAPI and cpp-python, the agent intelligently detects when a user needs up-to-date information (e.g., "search iPhone 16 price") and triggers a web search using DuckDuckGo. The results are then synthesized by the local Qwen2.5 model to provide a comprehensive answer.
The project is fully containerized, allowing for a seamless "write once, run anywhere" experience using Docker, while keeping the heavy model files managed locally.
The primary goal of this project is to provide a private, low-latency, and cost-effective alternative to cloud-based AI assistants. It is designed for developers and privacy enthusiasts who want to run powerful AI agents on consumer hardware.
Key capabilities include:
- Smart Intent Detection: Automatically switches between "Chat Mode" and "Search Mode" based on user input.
- Real-Time Knowledge: Overcomes the knowledge cutoff of static LLMs by fetching live data from the web.
- Local Inference: Uses 4-bit quantized GGUF models (
Qwen2.5-0.5B) to run efficiently on CPU/RAM. - Full-Stack Experience: Provides a clean, dark-mode chat interface built with Vanilla JS, connected to a robust Python backend.
The system follows a microservice-like architecture encapsulated within a Docker container:
- Frontend: Captures user input and handles UI state (Thinking/Searching animations).
- API Layer: FastAPI receives the request.
- Agent Logic: Analyzes the prompt to decide if a search tool is needed.
- Tool Execution: If needed, queries DuckDuckGo (
ddgs) for live results. - LLM Inference: The Context + Query is fed into
llama-cpp-pythonrunning the Qwen model. - Response: The final answer is streamed back to the user.
- LLM Engine:
llama-cpp-python(Binding for llama.cpp) - Model:
Qwen2.5-0.5B-Instruct(GGUF Format - Quantized) - Web Search Tool:
duckduckgo-search - Model Management:
Hugging Face Hub(for downloading the GGUF)
- Framework:
FastAPI - Server:
Uvicorn - Data Validation:
Pydantic
- Core:
HTML5,CSS3,Vanilla JavaScript - Styling: Custom CSS with Dark Mode & Responsive Design
- Containerization:
Docker - Virtualization:
Docker Volumes(For mapping local models)
ai/main.pyβ Script to download the GGUF model.models/β Directory where the model file will be stored.
agent.pyβ Core logic for the AI Agent (Switching between Search/Chat).main.pyβ FastAPI application entry point.tools.pyβ Implementation of the Internet Search tool.index.htmlβ The frontend chat interface.Dockerfileβ Configuration for building the application image.requirements.txtβ Python dependencies.
Since the AI model file (.gguf) is large, it is NOT included in the GitHub repository. You must download it manually using the provided script before running the Docker container.
git clone https://github.com/YOUR_USERNAME/DeepSearch-AI.git
cd DeepSearch-AIYou need to download the qwen2.5-0.5b-instruct-q4_k_m.gguf model. I have prepared a script to do this automatically.
First, install the necessary library:
pip install huggingface_hubThen, run the download script:
python ai/main.pyThis will download the model (~400MB) and place it into the ./ai/models/ directory.
docker build -t ai-agent .We use Docker Volumes to map the downloaded model into the container. This keeps the image light and allows you to swap models easily.
docker run -d -p 8000:8000 --name ai-agent \
-v $(pwd)/ai/models:/app/ai/models \
ai-agentOpen your browser and go to: http://localhost:8000
- Chat Mode: If you ask general questions (e.g., "Write a poem"), the Local LLM answers directly.
- Search Mode: If you start your sentence with keywords like
search,ara, orbul(e.g., "search Tesla stock price"), the Agent:- Parses your query.
- Searches DuckDuckGo for live results.
- Reads the content.
- Summarizes the answer using the LLM.
DeepSearch AI demonstrates how to build a functional, privacy-focused AI Agent without relying on paid cloud APIs. By combining FastAPI for the backend, Docker for deployment, and GGUF quantization for performance, it brings the power of modern LLMs to your local machine.