π Table of Contents
- π€ Introduction
- βοΈ Tech Stack
- π Features
- π οΈ Architecture
- π€Έ Getting Started
- π Project Structure
- π Miscellaneous
Checkout the frontend code: Docugenie Frontend
DocuGenie is an AI-powered chat assistant that helps you extract insights, summarize content, analyze PDF documents and other file formats, generate diagrams and visualizations, create comprehensive plans, perform agentic tasks, and generate AI-powered documents. It provides intelligent document processing capabilities including text extraction, content analysis, automated summarization, diagram generation, planning assistance, and AI document creation.
- Next.js - React framework for server-side rendering and routing
- React.js - Component-based UI library
- Tailwind CSS - Utility-first CSS framework for styling
- shadcn/ui - Accessible UI component library
- Python >=3.10 - Programming language
- FastAPI - Modern web framework with async support
- Pydantic - Data validation and settings management
- SQLAlchemy - Database ORM and management
- Alembic - Database migrations
- PostgreSQL - Primary database system
- FAISS - Vector similarity search and clustering
- LangChain - AI model integration and RAG pipelines
- Google Gemini - Primary LLM for text generation
- HuggingFace - Embedding models and transformers
- OpenAI - Alternative LLM provider
- Docker - Containerization
- Poetry - Python dependency management
- pnpm - JavaScript/TypeScript package manager
- Prettier - Code formatter for frontend codebase
- ESLint - JavaScript/TypeScript linting
- Jest - JavaScript/TypeScript testing framework
- Pytest - Python testing framework
- Ruff - Python code linting and formatting
- π Multi-format document support (PDF, DOCX, TXT)
- π Intelligent text extraction and parsing
- π§ AI-powered content analysis and summarization
- π Automated diagram and visualization generation (Coming Soon...)
- π¬ Context-aware document conversations
- π Semantic search across document collections
- π AI-powered document creation and editing (Coming Soon...)
- π― Agentic task execution (Coming Soon...)
- ποΈ PostgreSQL with SQLAlchemy ORM
- π Alembic database migrations
- π FastAPI with high-performance endpoints
- π Pydantic models for robust data validation
- π FAISS vector database for efficient similarity search
- β‘ Next.js for fast server-side rendering and routing
- βοΈ Modular component architecture with React.js
- π shadcn/ui for accessible, modern UI components
- π¨ Tailwind CSS for utility-first, customizable styling
- π οΈ TypeScript support for type-safe frontend development
- β»οΈ pnpm for efficient JavaScript/TypeScript package management
- π§ͺ Jest for robust frontend testing
- β¨ Prettier & ESLint for consistent code style and linting
The system leverages LangChain AI models, FAISS vector database, and RAG (Retrieval-Augmented Generation) pipeline to provide intelligent document analysis and conversational interactions.
The document processing pipeline transforms raw documents into searchable, intelligent knowledge bases:
Key Components:
- Parser & Tokenizer: Extracts text content and breaks it into manageable chunks
- LangChain AI Models: Generates embeddings using state-of-the-art language models
- FAISS Vector Database: Stores and indexes vectors for efficient similarity search
- Metadata Database: Maintains document metadata, chunk references, and user sessions
The chat system provides intelligent, context-aware responses using retrieved document knowledge:
Key Components:
- Chat Orchestrator: Manages conversation flow, session context, and query understanding
- RAG Pipeline: Retrieves relevant document chunks using vector similarity search
- FAISS Vector DB: Provides fast and accurate semantic search capabilities
- Prompt Manager: Optimizes prompts with context injection and formatting guidelines
- LangChain AI: Generates intelligent responses using retrieved context
- Python - >=v3.10
- PostgreSQL - >=v16
- Poetry - >=v2.2.0 (recommended) or pip
- Clone the repository
git clone https://github.com/imtiaj-007/docugenie-backend.git
cd docugenie-backend- Setup virtual environment
# Using Poetry (recommended):
poetry shell
# Using virtualenv (alternative):
python -m venv venv
source venv/bin/activate # On Linux/MacOS
venv\Scripts\activate # On WindowsYou can use the project in 2 ways -
-
Using Poetry (recommended): Poetry is a modern python package manager similar to npm, it helps to add, install and manage dependencies and running scripts on our application.
Along with poetry you can utilize the Poe the Poet commands such as -
poetry run poe dev poetry run poe test poetry run poe lint poetry run poe formator alternatively use the MakeFile commands:
make dev make test make lint make formatrefer to Available Commands section for detailed overview of available commands for this project
-
Using Pip: pip is the default python package installer used to install and manage software packages.
- Install dependencies
# Using Poetry
poetry install
# Traditional Way
pip install -r requirements.txt # Install production dependencies
pip install -r requirements-dev.txt # Install with dev dependencies- Run the application
# Using Poetry
poetry run docugenie
poetry run test
# Using Make commands
make dev
make test
# Traditional Way
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
pytest tests/ -v --cov=app --cov-report=htmldocugenie-backend/
βββ .github/ # GitHub Actions and CI/CD workflows
β βββ workflows
βββ .vscode/ # VSCode editor configuration
β βββ settings.json
β βββ extensions.json
βββ public # Public assets (e.g., images, logos)
βββ app/ # Main backend application code
β βββ ai # Modules for AI and LLM integrations
β βββ api # API route definitions and endpoints
β βββ aws # AWS utilities and integrations
β βββ core # Core application logic and utility functions
β βββ db # Database models and initialization
β βββ middlewares # FastAPI middleware definitions
β βββ services # Business logic and service layers
β βββ schemas # Pydantic schemas (request/response validation)
β βββ repositories # Database repositories and CRUD operations
β βββ main.py # Main entry point for FastAPI app
βββ tests # Unit and integration tests
βββ .gitignore
βββ .dockerignore
βββ Dockerfile
βββ docker-compose.yaml
βββ .pre-commit-config.yaml # Pre-commit hook configuration
βββ pyproject.toml # Python dependencies and configuration
βββ poetry.lock # Poetry lockfile for deterministic installs
βββ README.md # Project documentation
βββ LICENSE # License information[tool.poetry.scripts]
docugenie = "app.main:run_server"
[tool.poe.tasks]
format = "black app tests"
format-check = "black --check app tests"
lint = "ruff check app tests"
lint-fix = "ruff check --fix app tests"
typecheck = "mypy app"
[tool.poe.tasks.check-all]
sequence = ["format-check", "lint", "typecheck"]
ignore_fail = falseOr, run these from shell:
poetry run poe format # Format code (black)
poetry run poe format-check # Check formatting (black)
poetry run poe lint # Lint code (ruff + mypy)
poetry run poe lint-fix # Fix lint errors (ruff autofix)
poetry run poe typecheck # Type checks (mypy)
poetry run poe check-all # Run format check, lint, typecheck in sequence
poetry run docugenie # Start the FastAPI serverFor full config, see pyproject.toml.
.PHONY: install dev test lint format clean
install:
poetry install
dev:
poetry run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
test:
poetry run pytest tests/ -v --cov=app --cov-report=html
test-watch:
poetry run ptw tests/ -- -v
lint:
poetry run ruff check .
poetry run mypy app/
format:
poetry run black .
poetry run ruff check --fix .
migrate:
poetry run alembic upgrade head
migrate-create:
poetry run alembic revision --autogenerate -m "$(msg)"
clean:
find . -type d -name __pycache__ -exec rm -rf {} +
find . -type f -name "*.pyc" -delete
rm -rf .pytest_cache .coverage htmlcov
docker-up:
docker-compose up -d
docker-down:
docker-compose downOnce the application is running, access the interactive API docs:
- Swagger UI: https://docugenie-backend.up.railway.app/docs
- ReDoc: https://docugenie-backend.up.railway.app/redoc
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
CC BY-NC-SA 4.0 β Non-commercial use only. Credit required. Derivatives must be shared alike. Check Details here.