Internal project — code is private for security reasons. Only architecture and design docs are shared here.
A RAG-based AI system for automated review of internal procurement policies and legal compliance. Built to replace the manual process of looking up regulations every time a contract needs review.
Every contract review required manually checking the Subcontracting Act, internal procurement guidelines, and global compliance standards. Needed a system that could answer "does this contract violate the subcontracting law?" and return the relevant clause as evidence — instantly.
User question
↓
Query preprocessing (keyword extraction + intent classification)
↓
Vector search (internal policy docs + legal data)
↓
Context assembly (relevant document chunks)
↓
LLM response generation (with source citations)
↓
Answer + referenced documents returned
LangChain Most flexible framework for RAG pipelines. Document loaders, chunking, vector store, and LLM connections are all swappable by module.
Ollama + DeepSeek (local LLM) Internal compliance data couldn't be sent to external APIs. Local LLM was a hard requirement. DeepSeek had the best balance of Korean language performance and speed.
Chunking strategy Legal documents chunked at article level, internal policies at section level. Fixed-size chunking risked splitting mid-clause — structure-based chunking applied instead.
Document quality Internal docs were a mix of PDF, Word, and Excel. Different parsers needed per format in the preprocessing pipeline.
Korean legal text structure Complex cross-reference patterns like "Article 3, Paragraph 2, Proviso clause." → Solved with hybrid: keyword filter first, vector search second.
Hallucination control Wrong answers in legal review = real risk. → Forced every response to include source document + clause number. → Prompt designed to output "no basis found" when no relevant docs retrieved.
Python LangChain Ollama DeepSeek Streamlit Vector DB
Running as internal pilot. Chunking strategy and prompt refinement planned after user feedback collection.