Skip to content

Production-grade NLQ platform with FastAPI-powered RAG pipeline and AWS data engineering infrastructure for natural language querying across multiple data sources

Notifications You must be signed in to change notification settings

shyamprasadc/nlq-data-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLQ Data Platform

A production-grade Natural Language Query platform combining a FastAPI backend for RAG-powered queries and a PySpark ETL pipeline for data ingestion.

Project Structure

nlq-data-platform/
├── backend/          # FastAPI server (NLQ + RAG pipeline)
├── etl/              # PySpark ETL pipeline (MySQL → S3)
├── docker-compose.yml
└── pyproject.toml    # Unified dependencies

Quick Start

Prerequisites

  • Python 3.14+
  • uv for dependency management

Installation

uv sync

Running Services

Backend API Server:

python backend/run.py
# API docs: http://localhost:8000/docs

ETL Pipeline:

python -m etl.main

Docker (all services):

docker-compose up -d

Services

Service Description Docs
Backend FastAPI server with LangChain RAG pipeline for natural language queries backend/README.md
ETL PySpark pipeline for incremental data ingestion from MySQL to S3 etl/README.md

Environment Variables

Each service has its own .env file:

  • backend/.env — API keys, database URLs, secrets
  • etl/.env — MySQL credentials, S3 bucket config

About

Production-grade NLQ platform with FastAPI-powered RAG pipeline and AWS data engineering infrastructure for natural language querying across multiple data sources

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published