A production-grade Natural Language Query platform combining a FastAPI backend for RAG-powered queries and a PySpark ETL pipeline for data ingestion.
nlq-data-platform/
├── backend/ # FastAPI server (NLQ + RAG pipeline)
├── etl/ # PySpark ETL pipeline (MySQL → S3)
├── docker-compose.yml
└── pyproject.toml # Unified dependencies
- Python 3.14+
- uv for dependency management
uv syncBackend API Server:
python backend/run.py
# API docs: http://localhost:8000/docsETL Pipeline:
python -m etl.mainDocker (all services):
docker-compose up -d| Service | Description | Docs |
|---|---|---|
| Backend | FastAPI server with LangChain RAG pipeline for natural language queries | backend/README.md |
| ETL | PySpark pipeline for incremental data ingestion from MySQL to S3 | etl/README.md |
Each service has its own .env file:
backend/.env— API keys, database URLs, secretsetl/.env— MySQL credentials, S3 bucket config