LEXAI (Legal AI Assistant) is an AI-powered LegalTech chatbot designed to make legal information accessible to all Nigerians. Built on the Constitution of the Federal Republic of Nigeria 1999 (updated with First, Second, Third (2010), and Fourth (2017) Alterations), LEXAI empowers the 70% of Nigerians without legal access by offering a free, intuitive, and scalable solution. Using a hybrid retrieval system (FAISS + BM25) and the API for natural language generation, LEXAI delivers reasoned legal insights with sub-second latency.
- Legal Query Answering: Combines dense (FAISS) and sparse (BM25) retrieval for precise, context-aware responses from the Nigerian Constitution.
- Reasoned Responses: Every reply includes explicit reasoning, citing specific Constitution sections (e.g., "Found in Chapter IV, Section 33").
- Session Persistence: Stores chat history per session using SQLite for continuity.
- Scalable Design: Modular architecture with FastAPI, ready for future enhancements like AWS migration or multimodal support.
- Interactive Testing: Local interactive loop for rapid development and debugging.
- Free Deployment: Hosted on Northflank’s free tier for zero-cost operation.
- Backend: Python 3.9+, FastAPI (for API)
- Language Model: OpenRouter API (Deepseek model and it's variants as fallback), { Previously Groq API (Llama3-8b-8192 model) }
- Vector Store: FAISS (dense embeddings) + BM25 (sparse retrieval)
- Database: SQLite for chat history --- No longer valid, will not scale, Changing to a better database WIP
- Deployment: Docker on Northflank
- Preprocessing: PyPDF for PDF text extraction
- Logging: Built-in Python logging for monitoring
LEXAI’s architecture balances performance, accessibility, and scalability. Here’s why we chose each component:
- Why FAISS over TF-IDF? FAISS (Facebook AI Similarity Search) provides faster, more efficient similarity search for dense embeddings, crucial for large legal documents. TF-IDF, while simpler, lacks semantic depth—FAISS captures meaning better.
- Why Hybrid? Combining FAISS (dense) with BM25 (sparse) ensures both semantic relevance and keyword precision. Legal queries often need exact matches (e.g., section numbers), which BM25 excels at, while FAISS handles broader context.
- Why OpenRouter? OpenRouter’s API offers fast, reliable access to powerful models like deepseek, ideal for generating natural, context-aware legal responses. It’s cost-effective, aligning with LEXAI’s free-to-use model.
- Why Not SwarmaURI’s GroqModel? Direct API calls provide more control and stability across environments, avoiding compatibility issues.
- Why Custom? A tailored RAG agent manages conversation history and context injection, ensuring responses are grounded in the Constitution while maintaining an approachable tone.
- Python 3.9+
- Docker (for deployment)
- Git (for cloning the repository)
- Clone the Repository:
git clone https://github.com/Ksschkw/lexai.git cd lexai - Install Dependencies:
pip install -r requirements.txt
- Configure Environment:
- Create a
.envfile in the root directory:OPENROUTER_API_KEY=your_openrouter_api_key_here - Ensure no extra spaces or quotes around the API key.
- Create a
- Prepare Data:
- Place the Constitution of the Federal Republic of Nigeria 1999 PDF (274 pages, 3.88MB) in the root as
Constitution-of-the-Federal-Republic-of-Nigeria.pdf. - Update
config/settings.pywith the correctCONSTITUTION_PATHif renamed.
- Place the Constitution of the Federal Republic of Nigeria 1999 PDF (274 pages, 3.88MB) in the root as
- Run the interactive loop:
python main.py
- Type legal queries (e.g., "What are my rights?") and exit with "exit".
- Start the server:
python main.py --server
- Send a POST request:
curl -X POST "http://localhost:8000/query" -H "Content-Type: application/json" -d '{"query": "What are my rights?", "session_id": "test"}'
- Build the Docker image:
docker build -t lexai . - Push to a container registry or use Northflank’s CLI.
- Create a Northflank service:
- Set
PORT=8000in environment variables. - Deploy the image and obtain the URL (e.g.,
http://lexai.northflank.app).
- Set
- Model-View-Controller (MVC) Pattern:
- Model: Handles data preprocessing (
document.py), hybrid vector store (tfidf_store.py), Deepseek LLM (groq_llm.py{i intentionally did not change the naming even though i am no longer ussing groq}), and RAG agent (rag_agent.py). - View:
endpoints.pydefines FastAPI routes. - Controller:
query_handler.pymanages sessions and queries.
- Model: Handles data preprocessing (
- Hybrid Retrieval: Combines FAISS for dense embeddings and BM25 for sparse keyword search.
- Modularity: Each component adheres to SOLID principles for scalability.
- Market: Targets the $35B global LegalTech market, focusing on Nigeria’s underserved population.
- Innovation: First free AI legal assistant for Nigeria, with hybrid retrieval and reasoned responses.
- Accessibility: Serves the 70% of Nigerians without legal access with a free, scalable solution.
- Multimodal: Support PDF uploads and voice queries (e.g., using Tesseract, speechrecognition). -CHANGE OF PLANS, I'LL USE PADDLE_OCR, tesseract is too much effort for meh results(maybe it's just skill issues)
- Multiagent: Add translation (e.g., English to Igbo,Yoruba) and logging agents with LangGraph.
- Scalability: Migrate to AWS ECS for increased load.
Okafor Kosisochukwu Johnpaul (Kosi/Ksschkw)
MIT License.
- Inspired by MYRAGAGENT.
About to switch to just bm25s, Embedding models take spacce, and i feel like it might be overkill for this case Infact i will just leave the both.
