A Retrieval-Augmented Generation (RAG) system that allows you to query PDF documents using Large Language Models. This application uses DeepSeek Chat for natural language processing and OpenRouter API for model access.
- PDF document processing and text extraction
- Efficient document embedding generation
- Vector database for similarity search
- RAG-powered question answering
- Integration with DeepSeek Chat model
- Bun runtime environment
- OpenRouter API key (or other OpenAI-compatible provider)
- Install dependencies:
bun install- Set up your OpenAI/OpenRouter API key:
- Copy the
.env.examplefile to.env:
cp .env.example .env
- Edit
.envand replaceyour-api-key-herewith your actual OpenAI/OpenRouter API key
- Copy the
main.ts: Entry point for the applicationpdf.ts: PDF processing and text extractionembeddings.ts: Document embedding generationvectorDB.ts: Vector database operationsrag.ts: RAG implementation and query handling
The application supports two main commands:
To embed a PDF document into the vector database:
bun main.ts ingest <pdf_file>Example:
bun main.ts ingest AyaMohsenResume.pdfThis will:
- Extract text from the PDF
- Split the text into manageable chunks
- Generate embeddings for each chunk
- Store the chunks and embeddings in the vector database
To ask questions about the ingested document:
bun main.ts query "<your question>"Example:
bun main.ts query "Explain who this person is and can they be good for game development?"The system will:
- Generate an embedding for your question
- Find relevant context from the PDF using similarity search
- Use the DeepSeek Chat model to generate an accurate answer based on the retrieved context
@xenova/transformers: For generating embeddingsopenai: For interacting with the OpenAI/OpenRouter APIpdf-parse: For extracting text from PDF documents