Skip to content

tensorlakeai/paper-outline

Repository files navigation

arxiv Paper Outline and Summarization API

A TensorLake application that processes research papers (PDFs) using Google's Gemini AI to create structured outlines and detailed section expansions, storing the metadata and outline in Postgres tables.

Features

  • PDF ingestion: Fetches and processes PDFs from URLs
  • Outline generation: Extracts title, authors, abstract, keywords, and full section hierarchy
  • Section expansion: Produces detailed per-section summaries, key findings, methods, results, and notable references
  • Database storage: Saves all structured output into PostgreSQL (tested with Neon; works with Supabase or any Postgres)

Why TensorLake?

  • Write code like a monolith, get a distributed system for free
    Your app is just Python functions calling each other. TensorLake runs each function in its own container, scales them independently, and parallelizes requests without any orchestration code.
  • Automatic queueing, scaling, and backpressure
    You don’t need Celery, Kafka, Kubernetes, autoscalers, or job runners. The runtime queues requests, spins up more containers for bottleneck functions, and processes workloads at whatever concurrency the code can handle.
  • Durable, restartable execution
    If a long-running request crashes halfway (PDF too large, LLM timeout, network blip), it resumes from the last function boundary instead of restarting from scratch.

Architecture

The application consists of four main functions:

  1. create_outline(pdf_url): Downloads PDF and creates structured outline using Gemini
  2. expand_section(pdf_url, section_title, section_description): Expands a single section with detailed structured data
  3. expand_all_sections(outline): Orchestrates parallel expansion of all sections
  4. write_to_postgres(outline, expanded_sections): Stores all data in PostgreSQL
  5. process_paper(pdf_url): Main orchestration function that chains all steps

Setup

Prerequisites

  • TensorLake CLI installed (pip install tensorlake)
  • Gemini API key
  • PostgreSQL database

Configuration

  1. Authenticate with TensorLake:

    tensorlake login
    tensorlake whoami
  2. Set up secrets:

    # Gemini API key
    tensorlake secrets set GEMINI_API_KEY=your_gemini_api_key
    
    # PostgreSQL connection string
    tensorlake secrets set POSTGRES_CONNECTION_STRING="postgresql://user:password@host:port/database"

Deployment

Deploy the application to TensorLake:

tensorlake deploy paper_outline_app.py

Once the application is deployed, it's available as an HTTP API -

https://api.tensorlake.ai/applications/process_paper

Usage

Via HTTP

curl https://api.tensorlake.ai/applications/process_paper \
-H "Authorization: Bearer $TENSORLAKE_API_KEY" \
--json '"https://www.arxiv.org/pdf/2510.18234"'

Status

The application doesn't return any data back when the request finishes, it writes the processed data in the database. You can poll for the request ID to know the status of the request.

curl https://api.tensorlake.ai/applications/process_paper/requests/h-0XJD_eE1JTH90ylW4f- \
-H "Authorization: Bearer $TENSORLAKE_API_KEY"
#{"id":"h-0XJD_eE1JTH90ylW4f-","outcome":"success", ... }

Output

The outputs from the application are written in Postgres. We used Neon for testing, you can chose any other database

Screenshot 2025-11-10 at 8 53 43 PM

Dashboard

You can observe the state of the request on Tensorlake's UI as well.

Screenshot 2025-11-10 at 8 52 31 PM

local development:

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export GEMINI_API_KEY=your_key
export POSTGRES_CONNECTION_STRING=your_connection_string

# Test locally
python paper_outline_app.py

About

An application to extract outlines from Papers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages