An AI-powered assistant built using Retrieval-Augmented Generation (RAG) that provides context-aware answers about Tamil Nadu traffic rules, penalties, and driver rights. It combines semantic search with LLMs for accurate, explainable, and trustworthy responses.
traffic_rules_assistant/
├── .venv/
├── data/
| ├── processed/
| | ├── faiss_index.idx
| | ├── TN_traffic_rules.json
| | └── TN_traffic_rules.txt
| └── TN Traffic rules.pdf
├── src/
│ ├── chunking.py
│ ├── embedding.py
│ ├── generator.py
│ ├── retriever.py
│ ├── main.py
│ └── text_extraction.py
├── api/
│ └── app.py
├── frontend/
│ └── index.html
├── .gitignore
├── .python-version
├── README.md
├── requirements.txt
└── pyproject.toml
This project extracts legal content from a government-issued traffic PDF and makes it queryable via natural language questions using a custom-built RAG pipeline. It uses:
sentence-transformersfor generating document embeddingsFAISSfor fast semantic retrievalLangChainwithGroqLLM backend for fast and accurate response generationFastAPIto expose the system as an API
The assistant is designed to provide concise, legally grounded, and verifiable answers.
- Citizens of Tamil Nadu
- Driving school instructors and trainees
- Traffic law educators
- AI/ML enthusiasts learning about RAG pipelines
- Python 3.11+
- Basic terminal/CLI knowledge
- Internet access for LLM API (Groq)
- Groq API key (free tier available)
# Clone the repository
https://github.com/madhans476/traffic_rules_assistant.git
cd traffic_rules_assistant
# Create environment
uv venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
# Install dependencies
uv pip install -r requirements.txtGROQ_API_KEY=your-groq-api-key-here
data/TN Traffic rules.pdf
python src/text_extraction.pypython src/chunking.pypython src/embedding.pypython src/main.pyuvicorn api.app:app --reload
# Visit http://localhost:8000/docs-
Input: A traffic rulebook in
.pdf -
Output:
.txtfile with full extracted text.jsonwith clean overlapping chunks.idxFAISS index file
Manual test cases:
- Ask questions from the CLI or Swagger UI
- Evaluate LLM responses vs. original PDF
(Automated tests can be added using pytest.)
chunk_size = 300overlap = 50
IndexFlatIPused for vector similarity search
- Model:
mixtral-8x7b-32768(default) - Temperature:
0.3
-
Parse legal traffic PDF to text
-
Split text into semantic chunks
-
Embed chunks using
sentence-transformers -
Store in
FAISSfor fast vector search -
On query:
- Embed the question
- Retrieve top-k relevant chunks
- Pass to Groq LLM using LangChain
-
Return clean, structured answer
- Fast local retrieval (~50–100 ms)
- Groq LLM response: ~100 tokens/ms
- Accurate context-based answers
- No hallucinations (guardrails in prompt)
MIT License. See LICENSE file.
Pull requests welcome!
- Fork the repo
- Create a new branch
- Submit a pull request with a meaningful message
- Initial public release
- Added full pipeline: Extraction, Chunking, Embedding, Retrieval, Generation
- FastAPI API + CLI support
- Groq + LangChain integration
If you use this in academic work:
@misc{trafficassistant2025,
title={Tamil Nadu Traffic Rules Assistant using RAG},
author={Madhan S},
year={2025},
howpublished={\url{https://github.com/madhans476/traffic_rules_assistant}}
}
Maintainer: Madhan S
- Email: mail
- GitHub: @madhans476
- LinkedIn @madhans17

