Procurement RAG System

Internal project — code is private for security reasons. Only architecture and design docs are shared here.

A RAG-based AI system for automated review of internal procurement policies and legal compliance. Built to replace the manual process of looking up regulations every time a contract needs review.

Why I built this

Every contract review required manually checking the Subcontracting Act, internal procurement guidelines, and global compliance standards. Needed a system that could answer "does this contract violate the subcontracting law?" and return the relevant clause as evidence — instantly.

System architecture

User question
    ↓
Query preprocessing (keyword extraction + intent classification)
    ↓
Vector search (internal policy docs + legal data)
    ↓
Context assembly (relevant document chunks)
    ↓
LLM response generation (with source citations)
    ↓
Answer + referenced documents returned

Why these tools

LangChain Most flexible framework for RAG pipelines. Document loaders, chunking, vector store, and LLM connections are all swappable by module.

Ollama + DeepSeek (local LLM) Internal compliance data couldn't be sent to external APIs. Local LLM was a hard requirement. DeepSeek had the best balance of Korean language performance and speed.

Chunking strategy Legal documents chunked at article level, internal policies at section level. Fixed-size chunking risked splitting mid-clause — structure-based chunking applied instead.

What was hard

Document quality Internal docs were a mix of PDF, Word, and Excel. Different parsers needed per format in the preprocessing pipeline.

Korean legal text structure Complex cross-reference patterns like "Article 3, Paragraph 2, Proviso clause." → Solved with hybrid: keyword filter first, vector search second.

Hallucination control Wrong answers in legal review = real risk. → Forced every response to include source document + clause number. → Prompt designed to output "no basis found" when no relevant docs retrieved.

Stack

Python LangChain Ollama DeepSeek Streamlit Vector DB

Status

Running as internal pilot. Chunking strategy and prompt refinement planned after user feedback collection.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Procurement RAG System

Why I built this

System architecture

Why these tools

What was hard

Stack

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Procurement RAG System

Why I built this

System architecture

Why these tools

What was hard

Stack

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages