Skip to content

A self-contained AI project that runs a quantized Large Language Model (Qwen2.5-0.5B) entirely on your local machine. Built with FastAPI and llama-cpp-python, this agent intelligently switches between standard chat and "Search Mode" to fetch real-time data from the internet. The project features a responsive HTML/CSS/JS frontend and is fully Docker

Notifications You must be signed in to change notification settings

AliDmrcIo/LLM_Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepSearch AI (Local LLM Agent with Web Search)

DeepSearch AI is a fully local, full-stack conversational agent capable of real-time web search. It runs a quantized Large Language Model (LLM) directly on your machine using Docker, ensuring privacy and control without relying on external cloud APIs for the core logic.


Video Preview

0109.mp4

Project Description

This project is a production-ready Local AI Agent designed to bridge the gap between offline LLMs and real-time internet data. Built with FastAPI and cpp-python, the agent intelligently detects when a user needs up-to-date information (e.g., "search iPhone 16 price") and triggers a web search using DuckDuckGo. The results are then synthesized by the local Qwen2.5 model to provide a comprehensive answer.

The project is fully containerized, allowing for a seamless "write once, run anywhere" experience using Docker, while keeping the heavy model files managed locally.


Project Goal

The primary goal of this project is to provide a private, low-latency, and cost-effective alternative to cloud-based AI assistants. It is designed for developers and privacy enthusiasts who want to run powerful AI agents on consumer hardware.

Key capabilities include:

  • Smart Intent Detection: Automatically switches between "Chat Mode" and "Search Mode" based on user input.
  • Real-Time Knowledge: Overcomes the knowledge cutoff of static LLMs by fetching live data from the web.
  • Local Inference: Uses 4-bit quantized GGUF models (Qwen2.5-0.5B) to run efficiently on CPU/RAM.
  • Full-Stack Experience: Provides a clean, dark-mode chat interface built with Vanilla JS, connected to a robust Python backend.

Architecture

The system follows a microservice-like architecture encapsulated within a Docker container:

  1. Frontend: Captures user input and handles UI state (Thinking/Searching animations).
  2. API Layer: FastAPI receives the request.
  3. Agent Logic: Analyzes the prompt to decide if a search tool is needed.
  4. Tool Execution: If needed, queries DuckDuckGo (ddgs) for live results.
  5. LLM Inference: The Context + Query is fed into llama-cpp-python running the Qwen model.
  6. Response: The final answer is streamed back to the user.

πŸ› οΈ Technologies Used

AI & Core Logic

  • LLM Engine: llama-cpp-python (Binding for llama.cpp)
  • Model: Qwen2.5-0.5B-Instruct (GGUF Format - Quantized)
  • Web Search Tool: duckduckgo-search
  • Model Management: Hugging Face Hub (for downloading the GGUF)

Backend

  • Framework: FastAPI
  • Server: Uvicorn
  • Data Validation: Pydantic

Frontend

  • Core: HTML5, CSS3, Vanilla JavaScript
  • Styling: Custom CSS with Dark Mode & Responsive Design

DevOps & Deployment

  • Containerization: Docker
  • Virtualization: Docker Volumes (For mapping local models)

Libraries Used

Python FastAPI Docker HTML5 JavaScript Hugging Face Llama Cpp


πŸ“‚ Project Structure

  • ai/
    • main.py β†’ Script to download the GGUF model.
    • models/ β†’ Directory where the model file will be stored.
  • agent.py β†’ Core logic for the AI Agent (Switching between Search/Chat).
  • main.py β†’ FastAPI application entry point.
  • tools.py β†’ Implementation of the Internet Search tool.
  • index.html β†’ The frontend chat interface.
  • Dockerfile β†’ Configuration for building the application image.
  • requirements.txt β†’ Python dependencies.

How to Run Locally

Since the AI model file (.gguf) is large, it is NOT included in the GitHub repository. You must download it manually using the provided script before running the Docker container.

1. Clone the Repository

git clone https://github.com/YOUR_USERNAME/DeepSearch-AI.git
cd DeepSearch-AI

2. Download the Model (Critical Step)

You need to download the qwen2.5-0.5b-instruct-q4_k_m.gguf model. I have prepared a script to do this automatically.

First, install the necessary library:

pip install huggingface_hub

Then, run the download script:

python ai/main.py

This will download the model (~400MB) and place it into the ./ai/models/ directory.

3. Build the Docker Image

docker build -t ai-agent .

4. Run the Container

We use Docker Volumes to map the downloaded model into the container. This keeps the image light and allows you to swap models easily.

docker run -d -p 8000:8000 --name ai-agent \
-v $(pwd)/ai/models:/app/ai/models \
ai-agent

5. Access the Application

Open your browser and go to: http://localhost:8000


How It Works

  1. Chat Mode: If you ask general questions (e.g., "Write a poem"), the Local LLM answers directly.
  2. Search Mode: If you start your sentence with keywords like search, ara, or bul (e.g., "search Tesla stock price"), the Agent:
    • Parses your query.
    • Searches DuckDuckGo for live results.
    • Reads the content.
    • Summarizes the answer using the LLM.

Summary

DeepSearch AI demonstrates how to build a functional, privacy-focused AI Agent without relying on paid cloud APIs. By combining FastAPI for the backend, Docker for deployment, and GGUF quantization for performance, it brings the power of modern LLMs to your local machine.

About

A self-contained AI project that runs a quantized Large Language Model (Qwen2.5-0.5B) entirely on your local machine. Built with FastAPI and llama-cpp-python, this agent intelligently switches between standard chat and "Search Mode" to fetch real-time data from the internet. The project features a responsive HTML/CSS/JS frontend and is fully Docker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors