Skip to content

jjinyy/procurement-rag-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Procurement RAG System

Internal project — code is private for security reasons. Only architecture and design docs are shared here.

A RAG-based AI system for automated review of internal procurement policies and legal compliance. Built to replace the manual process of looking up regulations every time a contract needs review.


Why I built this

Every contract review required manually checking the Subcontracting Act, internal procurement guidelines, and global compliance standards. Needed a system that could answer "does this contract violate the subcontracting law?" and return the relevant clause as evidence — instantly.


System architecture

User question
    ↓
Query preprocessing (keyword extraction + intent classification)
    ↓
Vector search (internal policy docs + legal data)
    ↓
Context assembly (relevant document chunks)
    ↓
LLM response generation (with source citations)
    ↓
Answer + referenced documents returned

Why these tools

LangChain Most flexible framework for RAG pipelines. Document loaders, chunking, vector store, and LLM connections are all swappable by module.

Ollama + DeepSeek (local LLM) Internal compliance data couldn't be sent to external APIs. Local LLM was a hard requirement. DeepSeek had the best balance of Korean language performance and speed.

Chunking strategy Legal documents chunked at article level, internal policies at section level. Fixed-size chunking risked splitting mid-clause — structure-based chunking applied instead.


What was hard

Document quality Internal docs were a mix of PDF, Word, and Excel. Different parsers needed per format in the preprocessing pipeline.

Korean legal text structure Complex cross-reference patterns like "Article 3, Paragraph 2, Proviso clause." → Solved with hybrid: keyword filter first, vector search second.

Hallucination control Wrong answers in legal review = real risk. → Forced every response to include source document + clause number. → Prompt designed to output "no basis found" when no relevant docs retrieved.


Stack

Python LangChain Ollama DeepSeek Streamlit Vector DB


Status

Running as internal pilot. Chunking strategy and prompt refinement planned after user feedback collection.

About

Internal compliance & policy review system using RAG — LangChain + local LLM (Ollama/DeepSeek)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors