The Research Agent API helps you find and understand information from research documents. Itβs made for researchers and professionals, combining general knowledge with specific details from documents to give clear and helpful answers.
- π§ Smart Question Routing: Automatically directs queries to the most suitable processing mechanism, including conversational memory for context-aware responses, vector-based retrieval for document-specific answers, and general LLM-based solutions for broader questions.
- π Hybrid Retrieval System: Combines keyword search (BM25) with semantic search using FAISS and OpenAI Embeddings for comprehensive and accurate document discovery.
- β¨ Enhanced Query Rewriting: Optimizes search precision by rephrasing user queries with advanced techniques.
- βοΈ Layered Answer Generation: Produces high-quality responses through a multi-stage process that includes self-assessment and quality assurance.
- π Flexible Retrieval Pipeline: Allows real-time indexing of newly uploaded documents, seamlessly updating the retrieval process.
- π REST API Accessibility: Provides easy integration with endpoints for document uploads, query handling, and managing conversational memory.
- π Advanced Monitoring: Incorporates Langfuse for in-depth logging, debugging, and performance monitoring, ensuring detailed insights into system operations.
- π‘οΈ Failsafe Mechanism: Ensures smooth operations by seamlessly switching to backup APIs in case the primary API is down.
-
Language Models:
- Primary: Llama 3 70B Instruct via TogetherAI API. π¦
- Backup: GPT-4o (via OpenAI API) for guaranteed response fallback in case of primary LLM issues. π€
-
ποΈ Vector Database: FAISS (Facebook AI Similarity Search) for creating semantic embeddings.
-
Retrieval Techniques: π
- π Keyword Search: BM25 (Best Matching 25) for keyword-based document retrieval.
- β¨ Semantic Search: OpenAI Embeddings for vector-based semantic search.
- π Ensemble Search: Combines BM25 and semantic search results for enhanced retrieval performance.
-
πΉοΈ Orchestration: FastAPI for API management and efficient handling of web requests.
-
π¦ Tracing: Langfuse for observability of application flow and debugging insights.
-
π Document Handling: PyPDF for processing PDF documents.
- Runtime: Python 3.12
- Containerization: Docker (via
docker-compose
) for reproducible deployments. - Key Libraries:
langchain
: Framework for developing applications with language models.langgraph
: Library for building robust conversational flows using graph architectures.faiss-cpu
: Library for creating and searching vector indices efficiently.fastapi
: Modern, high-performance web framework for building APIs.python-multipart
: For handling multipart/form-data file uploads.langfuse
: Langfuse library for tracing LLM applications.
- Endpoint:
POST /api/docs/upload
- Description: Uploads multiple PDF documents to be added to the system's knowledge base.
- Request Body: Accepts
multipart/form-data
with files under the keyfiles
. - Response: JSON object detailing each file's upload status (successful, failed, pages processed), along with the total number of files processed.
- Endpoint:
POST /api/query
- Description: Handles user queries against loaded documents or general knowledge. Supports conversation continuity via
thread_id
. - Request Body:
{ "question": "Your question here", "thread_id": "Optional thread ID for conversation continuity", "config": { // Optional custom config options like temperature or any other config } }
- Response: A JSON object containing the answer, a list of document references, and the thread ID.
{
"answer": "The answer to your question",
"references": [
{
"source": "paper1.pdf",
"relevance_score": 0.95,
"snippet": "A relevant excerpt from the paper"
},
...
],
"thread_id": "UUID of the conversation thread"
}
- Endpoint:
POST /api/thread/reset
- Description: Clears the memory for a specific conversation thread.
- Request Params: Accepts
thread_id
as a query parameter. - Response: Confirmation message and thread ID of the reset thread.
{
"status": "success",
"message": "Thread reset successfully",
"thread_id": "your-thread-id"
}
- Endpoint:
POST /api/vectordb/reset
- Description: Clears the FAISS vector database, removing all document indexes, and resetting the retrieval pipeline.
- Response: Confirmation message of the database reset, along with document status.
{
"status": "success",
"message": "Vector database reset successfully",
"document_count": 0,
"has_documents": false
}
- Endpoint:
GET /api/status
- Description: Gets the status of documents currently loaded in the vector database.
- Response: JSON object containing a boolean to check if document available and the document count.
{
"has_documents": true,
"document_count": 4
}
- Endpoint:
GET /api/thread/status/{thread_id}
- Description: Returns the status and history of a specific thread, including all messages exchanged.
- Response: JSON object showing the number of messages, a boolean to check is history available and a list of the messages exchanged.
{
"thread_id": "your-thread-id",
"message_count": 3,
"has_history": true,
"messages": [
{
"type": "human",
"content": "your question ?"
},
{
"type": "ai",
"content": "Your response to the question"
},
{
"type": "human",
"content": "second question ?"
}
]
}
βββ docker-compose.yml
βββ postman_test_cases.json
βββ requirements.txt
βββ tests/
β βββ conftest.py
β βββ unit/
β β βββ test_graph_service.py
β β βββ test_llm_service.py
β β βββ test_memory_service.py
β β βββ test_retrieval_service.py
β βββ fixtures/
βββ docs/
βββ papers/ <--- PDF documents are stored here
βββ scripts/
β βββ export_codebase.py
βββ src/
β βββ main.py
β βββ routers/
β β βββ docs_router.py
β β βββ query_router.py
β βββ utils/
β β βββ config.py
β β βββ logging.py
β β βββ pdf_utils.py
β βββ models/
β β βββ request_models.py
β β βββ response_models.py
β βββ services/
β β βββ graph_service.py
β β βββ llm_service.py
β β βββ memory_service.py
β β βββ retrieval_service.py
β β βββ tracing_service.py
src/main.py
The `src/main.py` file serves as the entry point for the FastAPI application. It initializes the app, setting up essential components such as CORS and routing. On startup, the application preloads PDF documents from the `docs` directory to make them immediately accessible. Additionally, it exposes a root path `/` that can be used for system health checks.src/routers/docs_router.py
This module handles document upload and status retrieval functionality. It includes the `POST /api/docs/upload` endpoint, which allows users to upload PDF documents. Uploaded PDFs are processed and text is extracted using utilities from `src/utils/pdf_utils.py`. The router also provides the `GET /api/status` endpoint for checking the status of loaded documents.src/routers/query_router.py
The `query_router` defines endpoints for query processing, conversation thread management, and resetting the vector database. The `POST /api/query` endpoint dynamically routes user queries to the memory service, vector store, or general LLM handler based on the characteristics of the question. It uses `src/services/memory_service.py` to maintain and manage conversation context. Additionally, it includes the `POST /api/thread/reset` endpoint for clearing conversation history and the `POST /api/vectordb/reset` endpoint to clear all indexed documents from the vector store.src/utils/config.py
This module manages configuration settings for the application by loading environment variables from `.env` files. It defines key settings for OpenAI and Langfuse integrations, as well as paths for default directories used throughout the application.src/utils/logging.py
The `logging` module configures the `loguru` library for logging. It sets up both console and file-based logging to ensure that system activity is appropriately recorded for debugging and monitoring purposes.src/utils/pdf_utils.py
The `pdf_utils` module provides utility functions for loading and processing PDF files. It includes functions like `load_pdfs_from_directory` to load multiple PDFs from a specified directory and `load_pdf` to load a single PDF file. Additionally, it contains asynchronous functions for processing uploaded PDF files.src/models/request_models.py
This module defines Pydantic models used for validating request payloads. It includes the `QueryRequest` model for queries, which incorporates fields for the question, user ID, and thread ID. Additionally, the `UploadRequest` model is used for specifying the file type during document uploads.src/models/response_models.py
Response models for API interactions are defined in this module. It includes the `QueryResponse` model for returning answers to queries, as well as `UploadResponse` and `DocumentReference` models for document-related operations. An `ErrorResponse` model is also included to standardize error handling.src/services/graph_service.py
The `graph_service` module implements the core logic for processing research questions. It manages the overall flow, including question rewriting, document grading, and answer generation. It leverages `src/services/llm_service.py` for LLM interactions and `src/services/retrieval_service.py` for document retrieval, ensuring a seamless question-answering pipeline.src/services/llm_service.py
This module provides methods for interacting with Large Language Models (LLMs). It manages OpenAI and TogetherAI integrations with fallback logic to guarantee responses using `gpt-4o-mini` if `Llama-3.3-70B` models fail. It includes system prompts to define LLM behavior and incorporates error handling and tracing using `langfuse`.src/services/memory_service.py
The `memory_service` module is responsible for managing conversation history. It allows messages to be added to a thread, retrieves conversation messages when needed, and provides functionality to clear conversation threads entirely.src/services/retrieval_service.py
This module manages the document retrieval pipeline. It sets up and indexes documents using both BM25 keyword-based retrieval and FAISS for semantic search. The `rebuild` method allows the indexes to be updated with new documents. The module also includes functionality for extracting relevant document snippets to improve the relevance of retrieved content.src/services/tracing_service.py
The `tracing_service` module initializes the Langfuse client for observability. It provides methods to trace interactions and log events during API calls. Context-managed traces are implemented, allowing for detailed monitoring of system interactions, including quality assessments and scoring.Clone the repository and navigate to the project directory:
git clone https://github.com/RThaweewat/research-agent-api.git
cd research-agent-api
Copy the environment variables file and add your API keys:
cp .env.example .env
# Add your OpenAI, Together, and Langfuse API keys in the .env file
Install the required Python packages:
pip install -r requirements.txt
- Install Docker: If not already installed, follow instructions for your OS at Docker Installation Guide.
- Install Docker Compose: Usually included with Docker Desktop. For standalone installation, refer to Docker Compose Installation Guide.
Run the application using Uvicorn:
uvicorn src.main:app --host 0.0.0.0 --port 8000 --reload
-
Navigate to the project root where
docker-compose.yml
is located. -
Build and run the Docker containers in detached mode:
docker-compose up -d --build
-
To stop the Docker instance:
docker-compose down
Open your browser and navigate to http://localhost:8000/docs
to view the interactive API documentation.
Run unit tests using Pytest:
pytest tests/
These tests focus on the core logic of each module in the src/services
directory.
test_graph_service.py
: Tests the end-to-end flow and error handling within the graph architecture.test_llm_service.py
: Tests that the LLM is initialized correctly and handles prompts.test_memory_service.py
: Tests the storage and retrieval functionality of the conversational memory system.test_retrieval_service.py
: Tests that the document retrieval pipeline works correctly. These tests are crucial for ensuring the reliability of individual components.
The postman_test_cases.json
file provides comprehensive API tests which you can use with the Postman application. Import this collection into Postman to check the API functionality and confirm proper integration.