PDF to Visual Story Generator - Complete Implementation

🎯 Project Overview

A complete end-to-end system that transforms PDF documents into visual stories using AI. The pipeline extracts text from PDFs, processes it into scenes, and generates high-quality images using OpenAI's DALL·E API.

Complete Pipeline:

PDF Document → Text Extraction → Scene Analysis → Image Generation → Visual Story

📊 Project Status

Phase	Feature	Status	Version
Phase 1	PDF Upload & Text Extraction	✅ Complete	1.0.0
Phase 2	Text Processing & Scene Extraction	✅ Complete	2.0.0
Phase 3	Image Generation with DALL·E	✅ Complete	3.0.0

🏗️ Architecture Overview

System Components

┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (Optional)                      │
│                    React UI for visualization                    │
└────────────────────────────┬────────────────────────────────────┘
                             │ HTTP/REST
┌────────────────────────────▼────────────────────────────────────┐
│                      FastAPI Backend                             │
│  ┌────────────────┬─────────────────┬─────────────────────┐    │
│  │   Phase 1      │    Phase 2      │      Phase 3        │    │
│  │ PDF → Text     │  Text → Scenes  │  Scenes → Images    │    │
│  └────────────────┴─────────────────┴─────────────────────┘    │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────────┐
│                      Service Layer                               │
│  ┌──────────────┬──────────────┬──────────────────────────┐    │
│  │ PDF Extract  │ Text Cleaner │  Prompt Generator        │    │
│  │              │ Summarizer   │  Image Generator         │    │
│  │              │ Scene Extract│  Image Storage           │    │
│  └──────────────┴──────────────┴──────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

✨ Features

Phase 1: PDF Upload & Text Extraction

✅ Upload PDF files via REST API
✅ Page-by-page text extraction
✅ Validates file type and content
✅ Handles errors (scanned PDFs, corrupted files)
✅ Returns structured JSON with page content

Phase 2: Text Processing & Scene Extraction

✅ Cleans and normalizes extracted text
✅ Removes PDF artifacts (headers, footers, page numbers)
✅ Fixes broken sentences from page breaks
✅ Generates visual-focused summaries
✅ Extracts scenes with subjects, settings, and moods
✅ Identifies visual elements for image generation

Phase 3: Image Generation with DALL·E

✅ DALL·E-optimized prompt engineering
✅ Generates high-quality 1024×1024 images
✅ Organized file system storage
✅ URL-based image access
✅ Automatic retry and error handling
✅ Cost tracking and management

🚀 Quick Start

Prerequisites

# Required
Python 3.8+
pip (Python package manager)
OpenAI API key (for Phase 3)

# Recommended
Virtual environment (venv or conda)
Git (for version control)

Installation

# 1. Clone or download the project
cd pdf-story-generator

# 2. Create virtual environment
python -m venv venv

# Activate (Linux/Mac)
source venv/bin/activate

# Activate (Windows)
venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

# 5. Start the server
python main.py

Access Points

API Server: http://localhost:8000
API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health

🔄 Complete Pipeline Example

# Step 1: Upload PDF (Phase 1)
curl -X POST "http://localhost:8000/upload-pdf" \
  -F "file=@story.pdf" > phase1_output.json

# Step 2: Process Text (Phase 2)
cat phase1_output.json | jq '.pages' | \
curl -X POST "http://localhost:8000/process-text" \
  -H "Content-Type: application/json" \
  -d @- > phase2_output.json

# Step 3: Generate Images (Phase 3)
curl -X POST "http://localhost:8000/generate-images" \
  -H "Content-Type: application/json" \
  -d @phase2_output.json > phase3_output.json

# Step 4: View results
cat phase3_output.json | python -m json.tool

# Step 5: Access generated images
# Images are available at: http://localhost:8000/images/page_X/scene_Y.png

📁 Project Structure

pdf-story-generator/
├── backend/
│   ├── main.py                      # FastAPI application
│   ├── requirements.txt             # Python dependencies
│   ├── .env                         # Environment configuration
│   ├── .env.example                 # Configuration template
│   │
│   ├── services/                    # Business logic layer
│   │   ├── __init__.py
│   │   │
│   │   # Phase 2 Services
│   │   ├── text_cleaner.py         # Text cleaning & normalization
│   │   ├── summarizer.py           # Visual-focused summarization
│   │   ├── scene_extractor.py      # Scene detection & extraction
│   │   │
│   │   # Phase 3 Services
│   │   ├── prompt_generator.py     # DALL·E prompt engineering
│   │   ├── image_generator.py      # OpenAI API integration
│   │   └── image_storage.py        # File system storage
│   │
│   ├── generated_images/            # Generated image storage
│   │   ├── .gitkeep
│   │   ├── page_1/
│   │   │   ├── scene_1.png
│   │   │   └── scene_2.png
│   │   └── page_2/
│   │       └── scene_1.png
│   │
│   ├── test_phase2.py              # Phase 2 tests
│   └── test_phase3.py              # Phase 3 tests
│
├── frontend/                        # React UI (optional)
│   ├── src/
│   ├── public/
│   └── package.json
│
├── docs/
│   ├── PHASE1_README.md
│   ├── PHASE2_README.md
│   ├── PHASE3_README.md
│   ├── INTEGRATION_GUIDE.md
│   └── API_DOCUMENTATION.md
│
├── README.md                        # This file
└── .gitignore

🔧 Configuration

Environment Variables

Create a .env file in the backend directory:

# OpenAI Configuration (Required for Phase 3)
OPENAI_API_KEY=sk-your-openai-api-key-here

# Server Configuration
PORT=8000
HOST=0.0.0.0
ENVIRONMENT=development

# DALL·E Settings (Optional)
DALLE_MODEL=dall-e-3
DALLE_SIZE=1024x1024
DALLE_QUALITY=standard
DALLE_STYLE=vivid

🎨 How It Works

Phase 1: Text Extraction

# Upload PDF → Extract text page-by-page
# Uses pdfplumber for high-quality extraction
# Validates file type and handles errors

Input: PDF file
Output: Structured text by page

Phase 2: Scene Processing

# Clean text → Summarize → Extract scenes
# Removes PDF artifacts
# Identifies visual elements (subjects, settings, moods)

Input: Raw extracted text
Output: Structured scenes with descriptions

Phase 3: Image Generation

# Generate prompt → Call DALL·E → Save image
# Optimized prompts for DALL·E
# Mood-to-lighting mapping
# Organized file storage

Input: Scene descriptions
Output: High-quality PNG images

🚀 Deployment

Development

# Start server with auto-reload
python main.py

Production

# Use production ASGI server
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker

# Or with uvicorn
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

🎉 Summary

The PDF-to-Visual Story Generator is a complete, production-ready system that:

✅ Extracts text from PDF documents
✅ Processes text into visual scenes
✅ Generates high-quality images with DALL·E
✅ Provides a clean REST API
✅ Includes comprehensive documentation
✅ Has thorough test coverage
✅ Follows best practices

All three phases are complete and integrated!

Project Version: 3.0.0
Last Updated: January 2026
Status: ✅ Production Ready

Complete Pipeline: PDF → Text → Scenes → Images ✨

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Visual Story Generator - Complete Implementation

🎯 Project Overview

📊 Project Status

🏗️ Architecture Overview

System Components

✨ Features

Phase 1: PDF Upload & Text Extraction

Phase 2: Text Processing & Scene Extraction

Phase 3: Image Generation with DALL·E

🚀 Quick Start

Prerequisites

Installation

Access Points

🔄 Complete Pipeline Example

📁 Project Structure

🔧 Configuration

Environment Variables

🎨 How It Works

Phase 1: Text Extraction

Phase 2: Scene Processing

Phase 3: Image Generation

🚀 Deployment

Development

Production

🎉 Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

ananya-kkk/Book2Vision

Folders and files

Latest commit

History

Repository files navigation

PDF to Visual Story Generator - Complete Implementation

🎯 Project Overview

📊 Project Status

🏗️ Architecture Overview

System Components

✨ Features

Phase 1: PDF Upload & Text Extraction

Phase 2: Text Processing & Scene Extraction

Phase 3: Image Generation with DALL·E

🚀 Quick Start

Prerequisites

Installation

Access Points

🔄 Complete Pipeline Example

📁 Project Structure

🔧 Configuration

Environment Variables

🎨 How It Works

Phase 1: Text Extraction

Phase 2: Scene Processing

Phase 3: Image Generation

🚀 Deployment

Development

Production

🎉 Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages