Secure Research Assistant

Secure offline research assistant leveraging a 20B LLM and retrieval-augmented generation (RAG) for local document search and question answering.

1. Core Components

A. LLM Backend

https://lmstudio.ai/

Model: 20B LLM running in LM Studio
Role: Handles text generation, summarization, and Q&A
Access: Exposes a local API endpoint for frontend communication

B. Document Storage

Storage: Store PDFs, Word docs, and text files locally
Preprocessing:
- Split documents into chunks (500–1000 tokens each)
- Remove irrelevant formatting for cleaner embeddings

C. Embeddings & Vector Database

Embeddings: Open-source embedding model (e.g., sentence-transformers) to vectorize document chunks
Vector DB Options:
- FAISS (lightweight, local)
- Milvus (advanced, heavier)
Role: Fast retrieval of relevant chunks to feed the model contextually

D. Retrieval-Augmented Generation (RAG)

Workflow:

User asks a question
Query vector DB → retrieve top-K relevant chunks
Construct a prompt with context + user question
Send to LM Studio for answer generation

Benefit: Allows a 20B model to answer long-document questions without exceeding the context window

E. Frontend / UI

Framework: Streamlit (offline, Python-based)
Features:
- Upload documents (PDF, DOCX, TXT)
- Search & query interface
- Display answers with reference snippets
- Optional toggle to allow fallback to LLM knowledge if context is insufficient
- Show total response time for each query
- Notebook/history for past queries

       ┌───────────────┐
       │   User (You)  │
       └───────┬───────┘
               │
               ▼
      ┌─────────────────┐
      │ Streamlit UI     │
      │ - Upload docs    │
      │ - Ask questions  │
      │ - View answers   │
      │ - Response time  │
      └───────┬─────────┘
              │
              ▼
     ┌─────────────────────┐
     │ Local Backend API    │
     │ - /docs              │
     │ - /ask               │
     │ - Manage FAISS index │
     └───────┬─────────────┘
             │
             ▼
    ┌───────────────────────┐
    │ FAISS Vector DB        │
    │ - Embeddings via       │
    │   all-MiniLM-L6-v2    │
    │ - Retrieve top-K chunks│
    └───────┬───────────────┘
             │
             ▼
    ┌───────────────────────┐
    │ LM Studio 20B LLM     │
    │ - Receives prompt:    │
    │   "Question + Context"│
    │ - Generates answer     │
    │ - Optionally fallback │
    │   to own knowledge    │
    └──────────────┬────────┘
                   │
                   ▼
          ┌────────────────┐
          │ Answer + Sources│
          │ + Response Time │
          └────────────────┘
                   │
                   ▼
            Streamlit UI displays

Setup Instructions

1: Set Up LM Studio

Run your 20B model locally and expose a local API
Download LM Studio and load your 20B model
Enable the API server (LM Studio supports a local REST endpoint)

2: Clone the repository

git clone <your-repo-url>
cd secure-research-assistant

secure-research-assistant/
│
├── backend/
│   ├── api.py
│   ├── ingest.py
│   ├── embeddings.py
│   ├── retrieval.py
│   ├── config.py
│   └── utils.py
│
├── frontend/
│   └── chat.py          # Streamlit UI
│
├── models/
│   └── 20B_model/       # LM Studio model directory
│
├── data/
│   ├── documents/       # Uploaded documents
│   └── embeddings/      # FAISS index files
│
├── scripts/
│   ├── start_backend.sh
│   └── preprocess_docs.sh
│
├── README.md
└── requirements.txt

3. Create virtual environment & install dependencies

uv venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

4. Configure environment variables

cp config.example.env .env

Update .env with your local settings:

LMSTUDIO_API_URL=http://127.0.0.1:1234/v1/chat/completions
LLM_MODEL=openai/gpt-oss-20b
CHUNK_SIZE=500
TOP_K=5
EMBEDDING_MODEL=all-MiniLM-L6-v2
DATA_DIR=data/documents
EMBEDDING_DIR=data/embeddings

5. Run the backend server

uv run backend/app.py

Optional: Run with Gunicorn for production:

gunicorn -w 4 backend.api:app

Launch Streamlit Frontend

streamlit run frontend/chat.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Secure Research Assistant

1. Core Components

A. LLM Backend

B. Document Storage

C. Embeddings & Vector Database

D. Retrieval-Augmented Generation (RAG)

E. Frontend / UI

Setup Instructions

1: Set Up LM Studio

2: Clone the repository

3. Create virtual environment & install dependencies

4. Configure environment variables

5. Run the backend server

Launch Streamlit Frontend

About

Uh oh!

Releases

Packages

Languages

ash-singh/secure-research-assistant

Folders and files

Latest commit

History

Repository files navigation

Secure Research Assistant

1. Core Components

A. LLM Backend

B. Document Storage

C. Embeddings & Vector Database

D. Retrieval-Augmented Generation (RAG)

E. Frontend / UI

Setup Instructions

1: Set Up LM Studio

2: Clone the repository

3. Create virtual environment & install dependencies

4. Configure environment variables

5. Run the backend server

Launch Streamlit Frontend

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages