Skip to content

Offline Text-to-SQL Agent – Convert natural language questions into SQL queries with explanations. Fully open-source, runs locally, and features a Streamlit UI, multi-table support, and configurable SQLite databases. Built with LangChain, LangGraph, Ollama (CodeGemma 7B).

License

Notifications You must be signed in to change notification settings

Ginga1402/langgraph-text-to-sql-agent

Repository files navigation

📚 Text-to-SQL Agent: Natural Language Database Query System

Text-to-SQL LangGraph Agent is an open-source AI workflow that transforms natural language questions into SQL queries and insightful answers—powered locally by Ollama and Open Source LLM's, without relying on function calling.

🎯 Project Description

This project demonstrates an intelligent Text-to-SQL Agent that transforms natural language questions into executable SQL queries and returns human-readable results. Built to bridge the gap between business users and database systems, it enables anyone to query databases using plain English without SQL knowledge.

The system processes questions like "How many customers are from the USA?" and automatically generates the corresponding SQL query, executes it against the database, and provides both technical results and conversational explanations.

It is implemented in two ways:

  1. Jupyter Notebook (text_to_SQL_Langchain_agent.ipynb) — utilizes LangChain’s built-in SQL agent.

  2. Custom LangGraph Workflow — uses sequential nodes and custom tools for deterministic SQL generation and execution.

🏗️ Technical Architecture

The system follows a modular, stateful architecture combining LangGraph, Ollama, and SQLite for controlled Text-to-SQL generation.

LangGraph Agent Pipeline

  • Manages conversation flow using MessagesState
  • Executes 5 sequential nodes with deterministic routing (START → END)
  • Integrates custom @tool decorators for database operations

LLM Integration

  • Model: CodeGemma 7B Instruct (Q4_K_S) - Configurable
  • Inference Engine: Ollama (http://localhost:11434)
  • Context Handling: Structured prompts with schema injection and response formatting

Database Layer

  • Engine: SQLite using LangChain’s SQLDatabase abstraction
  • Schema: Chinook music database with 11 tables - Configurable
  • Safety: Read-only operations enforced with query validation

✨ Key Features

  • 🧠 Intelligent Query Processing: Converts natural language questions into accurate SQL queries, understands table relationships, and handles complex multi-table joins.
  • 🔄 Multiple Interfaces: Accessible via Streamlit web app (interactive dashboard), FastAPI REST endpoints (for integrations), and Jupyter Notebook (for development and experimentation).
  • 🛡️ Production-Ready Safety: Ensures read-only database operations, robust error handling, query validation, and automatic result limiting to prevent misuse or overload.
  • 📊 Rich Output Formats: Returns structured responses including generated SQL queries, raw database results, and natural-language explanations for better interpretability.

📁 Project Structure

📦 langgraph-text-to-sql-agent
│
├── configuration.py        # Environment setup and configurations
├── FastAPI.py              # API layer for backend integration
├── langgraphagent.py       # Core LangGraph workflow and agent logic
├── streamlitapp.py         # Streamlit frontend for user interaction
├── text_to_SQL_Langchain_agent.ipynb   # Jupyter notebook (LangChain agent version)
└── README.md
└── License

💡 Use Cases

  • Data Exploration: Ask natural language questions to explore relational data without writing SQL.
  • Business Intelligence: Generate quick insights from structured databases for decision-making.
  • Developer Tools: Integrate the FastAPI endpoint into analytics dashboards or internal data systems.
  • Education & Training: Teach SQL concepts interactively by showing natural language to SQL conversions.
  • AI Agents Integration: Use as a backend for chatbots or virtual assistants requiring database access.

🧭 Demo Sample Images

Streamlit Interface

Image

FastAPI Response

Image

🛠️ Installation Instructions

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (optional, for faster processing)
  • 8GB+ RAM recommended

Step 1: Clone Repository

git clone https://github.com/Ginga1402/langgraph-text-to-sql-agent.git
cd langgraph-text-to-sql-agent

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Set Up Ollama (LLM Backend)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the required model
ollama pull codegemma:7b-instruct-v1.1-q4_K_S

Step 4: Configure Paths

Update the paths in configuration.py to match your system:

model_name = "codegemma:7b-instruct-v1.1-q4_K_S"
database_name = "sqlite:///Chinook.db"

📖 Usage

Starting the Application

  1. Start the FastAPI Server:
python FastAPI.py

The API will be available at http://localhost:8000

  1. Launch the Streamlit Interface:
streamlit run streamlit_app.py

The web interface will open at http://localhost:8501

⚙️ Basic Workflow

  1. User Input → The user asks a natural language question (e.g., “How many customers are from the USA?”).
  2. List Tables → The system retrieves available table names from the SQLite database.
  3. Get Schema → The schema for relevant tables is extracted to provide context to the LLM.
  4. Generate SQL → The LLM (CodeGemma 7B via Ollama) creates a syntactically correct SQL query.
  5. Run Query → The SQL query is safely executed with validation and read-only enforcement.
  6. Generate Response → The system returns a structured output including:
    • The generated SQL query
    • Raw database results
    • A natural language explanation of the findings

Workflow Graph:

START → list_tables → get_schema → generate_sql → run_query → generate_response → END


🧱 Technologies Used

Technology Description Link
LangChain Framework for building LLM-driven applications LangChain
LangGraph Node-based agent orchestration for LLM workflows LangGraph
Ollama Local LLM inference engine Ollama
CodeGemma 7B (Q4_K_S) Quantized instruction-tuned model for code understanding Gemma Models
SQLite Lightweight relational database for structured data SQLite
FastAPI High-performance API framework for Python FastAPI
Streamlit Web framework for building interactive data apps Streamlit
Pydantic Data validation using Python type annotations pydantic.dev

🤝 Contributing

Contributions to this project are welcome! If you have ideas for improvements, bug fixes, or new features, feel free to open an issue or submit a pull request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🌟 Star History

If you find Text-to-SQL-Agent useful, please consider giving it a star ⭐ on GitHub!

About

Offline Text-to-SQL Agent – Convert natural language questions into SQL queries with explanations. Fully open-source, runs locally, and features a Streamlit UI, multi-table support, and configurable SQLite databases. Built with LangChain, LangGraph, Ollama (CodeGemma 7B).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published