A TensorLake application that processes research papers (PDFs) using Google's Gemini AI to create structured outlines and detailed section expansions, storing the metadata and outline in Postgres tables.
- PDF ingestion: Fetches and processes PDFs from URLs
- Outline generation: Extracts title, authors, abstract, keywords, and full section hierarchy
- Section expansion: Produces detailed per-section summaries, key findings, methods, results, and notable references
- Database storage: Saves all structured output into PostgreSQL (tested with Neon; works with Supabase or any Postgres)
- Write code like a monolith, get a distributed system for free
Your app is just Python functions calling each other. TensorLake runs each function in its own container, scales them independently, and parallelizes requests without any orchestration code. - Automatic queueing, scaling, and backpressure
You don’t need Celery, Kafka, Kubernetes, autoscalers, or job runners. The runtime queues requests, spins up more containers for bottleneck functions, and processes workloads at whatever concurrency the code can handle. - Durable, restartable execution
If a long-running request crashes halfway (PDF too large, LLM timeout, network blip), it resumes from the last function boundary instead of restarting from scratch.
The application consists of four main functions:
create_outline(pdf_url): Downloads PDF and creates structured outline using Geminiexpand_section(pdf_url, section_title, section_description): Expands a single section with detailed structured dataexpand_all_sections(outline): Orchestrates parallel expansion of all sectionswrite_to_postgres(outline, expanded_sections): Stores all data in PostgreSQLprocess_paper(pdf_url): Main orchestration function that chains all steps
- TensorLake CLI installed (
pip install tensorlake) - Gemini API key
- PostgreSQL database
-
Authenticate with TensorLake:
tensorlake login tensorlake whoami
-
Set up secrets:
# Gemini API key tensorlake secrets set GEMINI_API_KEY=your_gemini_api_key # PostgreSQL connection string tensorlake secrets set POSTGRES_CONNECTION_STRING="postgresql://user:password@host:port/database"
Deploy the application to TensorLake:
tensorlake deploy paper_outline_app.pyOnce the application is deployed, it's available as an HTTP API -
https://api.tensorlake.ai/applications/process_paper
curl https://api.tensorlake.ai/applications/process_paper \
-H "Authorization: Bearer $TENSORLAKE_API_KEY" \
--json '"https://www.arxiv.org/pdf/2510.18234"'The application doesn't return any data back when the request finishes, it writes the processed data in the database. You can poll for the request ID to know the status of the request.
curl https://api.tensorlake.ai/applications/process_paper/requests/h-0XJD_eE1JTH90ylW4f- \
-H "Authorization: Bearer $TENSORLAKE_API_KEY"
#{"id":"h-0XJD_eE1JTH90ylW4f-","outcome":"success", ... }The outputs from the application are written in Postgres. We used Neon for testing, you can chose any other database
You can observe the state of the request on Tensorlake's UI as well.
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export GEMINI_API_KEY=your_key
export POSTGRES_CONNECTION_STRING=your_connection_string
# Test locally
python paper_outline_app.py