A TypeScript project for building a Retrieval Augmented Generation (RAG) system that parses PDF files, creates embeddings, stores them persistently in PostgreSQL with pgvector, and answers questions using an AI model powered by Llama.cpp.
- PDF Parsing: Extract text content from PDF files using LangChain's PDFLoader
- Embeddings Creation: Generate embeddings from document text using embedding models (e.g., BGE-small)
- Persistent Vector Storage: Store embeddings in PostgreSQL with pgvector extension for scalable similarity search
- Graceful Fallback: If PostgreSQL is unavailable, embeddings automatically fall back to in-memory storage
- Similarity Search: Find relevant document chunks based on vector similarity using cosine distance
- RAG (Retrieval Augmented Generation): Answer questions using a language model with retrieved context
- Flexible Model Support: Use different Llama.cpp models for embeddings and generation
- High Performance: IVFFLAT vector indexing for fast nearest-neighbor searches
graph TD
A[PDF Document Processing] --> B[Embedding Generation]
B --> C[Embedding Storage Decision]
C --> D[PostgreSQL + pgvector]
C --> E[In-Memory Storage]
D --> F[IVFFLAT, Persistent, Scalable]
E --> G[Fast, Non-persistent]
- Node.js 18+ and npm
- PostgreSQL 13+ with pgvector extension
- Two Llama.cpp compatible models:
- Embedding Model (required):
models/bge-small-en-v1.5-Q8_0.gguf - Language Model (optional):
models/neural-chat-7b-v3-3-Q4_K_M.ggufor similar
- Embedding Model (required):
macOS:
brew install postgresql@15
brew install pgvector
brew services start postgresql@15Linux (Ubuntu/Debian):
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install build-essential postgresql-server-dev-15
git clone https://github.com/pgvector/pgvector.git
cd pgvector && make && sudo make install
sudo systemctl start postgresqlSee PGVECTOR_SETUP.md for detailed PostgreSQL setup.
psql -U postgresCREATE DATABASE embeddings;
\c embeddings
CREATE EXTENSION vector;npm installThis creates node_modules/ with all required packages including the pg client for PostgreSQL.
Create a .env file in your project root:
PG_HOST=localhost
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=postgres
PG_DATABASE=embeddingsEmbedding Model (Go to Hugging Face and download):
# BGE-small embedding model
# Place at: models/bge-small-en-v1.5-Q8_0.ggufRun the application with a PDF file and query:
npm start /path/to/your/file.pdf "Your question here"You can also query existing embeddings without providing a PDF file:
npm startThis will search through existing embeddings in PostgreSQL/pgvector and return relevant chunks.
# Answer a question about a PDF
npm start documents/paper.pdf "What are the main findings?"
# Default query if not provided
npm start documents/guide.pdf
# With relative path
npm start ./my-document.pdf "Summarize the introduction"
# Query existing embeddings (no PDF needed)
npm startWatch mode with auto-reload:
npm run dev- Parse PDF and extract text content
- Split text into manageable chunks (~1000 characters)
- Generate embeddings for each chunk using the embedding model
- Store embeddings persistently in PostgreSQL with pgvector
- If PostgreSQL unavailable, automatic fallback to in-memory storage
- Uses IVFFLAT vector indexing for fast similarity searches
- Embeddings stored with metadata in
embeddingstable
- Generate embedding for the user's question
- Search PostgreSQL pgvector for semantically similar chunks using vector similarity (cosine distance)
- Retrieve top-K most relevant chunks as context
- If no embeddings found in PostgreSQL, fallback to in-memory storage
- Send the user's question + relevant context to the language model
- Model generates an answer based on the provided context
- Return formatted response to user
┌─────────────────────────────────────────┐
│ PDF Documents → Embeddings │
└────────────────┬────────────────────────┘
│
┌───────▼──────────┐
│ Try PostgreSQL │
│ with pgvector │
└───────┬──────────┘
✓ │ ✗
┌────────┘└─────────┐
│ │
┌──▼──────┐ ┌────▼────┐
│PostgreSQL│ │Memory │
│pgvector │ │Store │
└──────────┘ └──────────┘
js-embeddings/
├── src/
│ ├── main.ts # Entry point, CLI handling
│ ├── pdf-embeddings.ts # PDF parsing and embedding functions
│ ├── query-engine.ts # RAG and question answering
│ ├── embedding-store.ts # PostgreSQL pgvector + Memory storage management
│ └── embedding-search.ts # Search and retrieval utilities
├── models/
│ ├── bge-small-en-v1.5-Q8_0.gguf # Embedding model
│ └── neural-chat-7b-v3-3-Q4_K_M.gguf # Language model (optional)
├── package.json
├── tsconfig.json
├── PGVECTOR_SETUP.md # PostgreSQL/pgvector setup guide
└── README.md
createEmbeddingStore(): Promise<EmbeddingStore>
- Creates embedding store (PostgreSQL with pgvector if available, in-memory fallback)
- Automatically checks PostgreSQL connectivity
- Configurable via environment variables (PG_HOST, PG_PORT, PG_USER, PG_PASSWORD, PG_DATABASE)
storeEmbeddingsWithFallback(store, chunks, embeddings)
- Stores embeddings in PostgreSQL or memory with automatic fallback
PgVectorStore.queryByEmbedding(embedding, limit) (Advanced)
- Query PostgreSQL directly using vector similarity
- Returns results with cosine similarity scores
- Gracefully handles query failures by returning empty array for fallback
searchEmbeddings(store, query, limit): Promise<string[]>
- Searches stored embeddings for similar documents
- Returns top-N results
getStorageInfo(store): Object
- Returns storage type and location information
clearAllEmbeddings(store)
- Clears all stored embeddings installation script using cosine similarity
- Retrieve top-K most relevant chunks as context
- Send the user's question + relevant context to the language model
- Model generates an answer based on the provided context
- Return formatted response to user
parsePDF(pdfPath: string): Promise<string[]>
- Parses a PDF file and splits content into chunks
- Returns array of text chunks
embedDocuments(context: LlamaEmbeddingContext, documents: readonly string[]): Promise<Map<string, LlamaEmbedding>>
- Creates embeddings for document chunks
- Handles errors gracefully
findSimilarDocuments(embedding: LlamaEmbedding, documentEmbeddings: Map): string[]
- Finds chunks similar to a query embedding
- Returns sorted by similarity score
createQueryEngine(llama: Llama, modelPath: string): Promise<LlamaContext | null>
- Loads a language model for question answering
- Returns null if model loading fails
queryWithContext(context: LlamaContext, query: string, documents: string[], maxResults: number): Promise<QueryResult>
- Generates answers using retrieved context
- Falls back to keyword matching if generation fails
formatQueryResult(result: QueryResult): string
- Formats the RAG result for display
- node-llama-cpp: Llama.cpp bindings for Node.js
- @langchain/community: Document loaders
- typescript: Language
- ts-node: TypeScript runtime
Modify in src/pdf-embeddings.ts:
const MAX_CONTEXT_CHARS = 1000; // Adjust based on model limitsChange models in src/main.ts:
const embeddingModelPath = "path/to/embedding/model.gguf";
const llmModelPath = "path/to/language/model.gguf";- Ensure model files are in the
models/directory - Check file paths in
main.ts
- Reduce
MAX_CONTEXT_CHARSinpdf-embeddings.ts - Use a smaller model or a model with larger context window
- The system will fall back to showing relevant chunks
- Download a language model for full RAG functionality
- Ensure model is in GGUF format
- If querying existing embeddings and getting no results, verify that embeddings exist in the database
- The system will automatically fall back to in-memory storage if PostgreSQL is unavailable
- Use GPU Acceleration: Metal on macOS, CUDA on NVIDIA, Rocm on AMD or Vulkan for vendor agnostic, check compatibility with node-llama-cpp
- Adjust Context Size: Larger chunks = better context but slower processing
- Limit Retrieved Results: Use fewer chunks as context for faster generation
- Model Selection: Smaller models (7B) for speed, larger (13B+) for quality
📄 Parsing PDF from: /path/to/document.pdf
✓ PDF parsed into 45 chunks
✓ Embeddings created successfully (45 chunks embedded)
❓ Query: "What is the main topic?"
✓ Found 3 relevant chunks
📚 Most relevant document chunks:
────────────────────────────────────────────────
[Chunk 1]
The main topic of this document is...
[Chunk 2]
Building on this, we can see...
────────────────────────────────────────────────
╔════════════════════════════════════════════════════════════════╗
║ QUERY RESULT ║
╚════════════════════════════════════════════════════════════════╝
📝 Question: "What is the main topic?"
💡 Answer:
The main topic is... [generated by language model]
ISC