Skip to content

longtermrisk/google-docs-bfs-export

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Docs BFS Exporter

Export Google Docs by following links (BFS crawl). Perfect for discovering and archiving scattered documentation.

Setup (5 minutes)

Run the setup script to install everything automatically:

Mac/Linux: ./setup.sh | Windows: setup.bat (double-click or run in cmd)

The script will:

  1. Install uv (Python package manager) if needed
  2. Install all dependencies via uv sync
  3. Prompt you to add credentials.json (see below)

Getting credentials.json

Option A: Colleague shares it with you via Slack/email (safe to share - it's just your OAuth app ID, not personal credentials)

Option B: Create your own (2 min):

  1. Go to Google Cloud Console
  2. Create project → Enable "Google Docs API" and "Google Drive API"
  3. Credentials → Create OAuth client ID → Desktop app
  4. Download JSON as credentials.json

First run

uv run python main.py --seed-id YOUR_DOC_ID

Browser opens for authorization first time. You authorize with YOUR Google account (each person gets their own access). After that, uses cached credentials in token.pickle.

Usage Examples

# Find your doc ID from the URL:
# https://docs.google.com/document/d/YOUR_DOC_ID_HERE/edit

# Export as Word docs locally (default)
uv run python main.py --seed-id YOUR_DOC_ID

# Export as markdown locally
uv run python main.py --seed-id YOUR_DOC_ID --format md

# Save copies to Google Drive folder (preserves as Google Docs)
uv run python main.py --seed-id YOUR_DOC_ID --drive YOUR_FOLDER_ID

# Save markdown to Drive with localized links (links point to other Drive files)
uv run python main.py --seed-id YOUR_DOC_ID --drive YOUR_FOLDER_ID --format md --localize-links

# Auto-request access to documents you don't have permission for
uv run python main.py --seed-id YOUR_DOC_ID --request-access

# Limit to 20 docs
uv run python main.py --seed-id YOUR_DOC_ID --max-docs 20

How It Works

BFS crawl: starts from seed doc → exports it → extracts Google Docs links → queues them → repeats. Files saved to exported_docs/ (or Drive with --drive).

Markdown support: Bold, italic, strikethrough, inline code, headings, links, lists (nested), basic tables. Missing: images, comments, footnotes. Use --format docx for perfect formatting.

Index CSV: Auto-generated at exported_docs/index.csv mapping doc IDs to filenames/Drive IDs.

Link Localization (--localize-links): Converts Google Docs URLs to local .md file links for offline browsing.

Options

uv run python main.py --help
  • --seed-id: Document ID to start from (required)
  • --format: md or docx (default: docx)
  • --drive FOLDER_ID: Save to Google Drive folder instead of local export
  • --localize-links: Convert Google Docs links to point to exported documents (works with Drive for markdown)
  • --request-access: Auto-request access to documents you can't view
  • --max-docs: Safety limit (default: 100)
  • --setup: Show OAuth setup instructions

Google Drive Export

Use --drive FOLDER_ID to save directly to a Google Drive folder instead of local export. Perfect for team archiving.

Get folder ID: Open folder in Drive, copy ID from URL: https://drive.google.com/drive/folders/YOUR_FOLDER_ID

What happens:

  • Markdown mode: Uploads .md files
  • Docx mode (default): Copies original Google Docs (preserves all formatting)
  • Index CSV saved locally with Drive file IDs

First time: Delete token.pickle to re-authenticate with Drive write permissions.

Troubleshooting

  • "Missing credentials.json": Run setup script again
  • "Access denied": Use --request-access to auto-request access from doc owners
  • Reset auth: Delete token.pickle and run again
  • "uv: command not found": Close/reopen terminal, run setup again

Files

  • setup.sh / setup.bat - One-click setup scripts
  • main.py - The script
  • credentials.json - OAuth app credentials (you provide, safe to share with colleagues)
  • token.pickle - Your personal auth tokens (auto-generated, never share)
  • exported_docs/ - Output folder
  • exported_docs/index.csv - Document index (auto-generated)

Known Limitations

Markdown export doesn't support: images, comments, complex table formatting, footnotes, drawings. Use --format docx for perfect formatting.

Development

See LLM.md for technical overview and architecture.

About

Export a google doc and all docs linked from there recursively

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published