Export Google Docs by following links (BFS crawl). Perfect for discovering and archiving scattered documentation.
Run the setup script to install everything automatically:
Mac/Linux: ./setup.sh | Windows: setup.bat (double-click or run in cmd)
The script will:
- Install
uv(Python package manager) if needed - Install all dependencies via
uv sync - Prompt you to add
credentials.json(see below)
Option A: Colleague shares it with you via Slack/email (safe to share - it's just your OAuth app ID, not personal credentials)
Option B: Create your own (2 min):
- Go to Google Cloud Console
- Create project → Enable "Google Docs API" and "Google Drive API"
- Credentials → Create OAuth client ID → Desktop app
- Download JSON as
credentials.json
uv run python main.py --seed-id YOUR_DOC_IDBrowser opens for authorization first time. You authorize with YOUR Google account (each person gets their own access). After that, uses cached credentials in token.pickle.
# Find your doc ID from the URL:
# https://docs.google.com/document/d/YOUR_DOC_ID_HERE/edit
# Export as Word docs locally (default)
uv run python main.py --seed-id YOUR_DOC_ID
# Export as markdown locally
uv run python main.py --seed-id YOUR_DOC_ID --format md
# Save copies to Google Drive folder (preserves as Google Docs)
uv run python main.py --seed-id YOUR_DOC_ID --drive YOUR_FOLDER_ID
# Save markdown to Drive with localized links (links point to other Drive files)
uv run python main.py --seed-id YOUR_DOC_ID --drive YOUR_FOLDER_ID --format md --localize-links
# Auto-request access to documents you don't have permission for
uv run python main.py --seed-id YOUR_DOC_ID --request-access
# Limit to 20 docs
uv run python main.py --seed-id YOUR_DOC_ID --max-docs 20BFS crawl: starts from seed doc → exports it → extracts Google Docs links → queues them → repeats. Files saved to exported_docs/ (or Drive with --drive).
Markdown support: Bold, italic, strikethrough, inline code, headings, links, lists (nested), basic tables. Missing: images, comments, footnotes. Use --format docx for perfect formatting.
Index CSV: Auto-generated at exported_docs/index.csv mapping doc IDs to filenames/Drive IDs.
Link Localization (--localize-links): Converts Google Docs URLs to local .md file links for offline browsing.
uv run python main.py --help--seed-id: Document ID to start from (required)--format:mdordocx(default: docx)--drive FOLDER_ID: Save to Google Drive folder instead of local export--localize-links: Convert Google Docs links to point to exported documents (works with Drive for markdown)--request-access: Auto-request access to documents you can't view--max-docs: Safety limit (default: 100)--setup: Show OAuth setup instructions
Use --drive FOLDER_ID to save directly to a Google Drive folder instead of local export. Perfect for team archiving.
Get folder ID: Open folder in Drive, copy ID from URL: https://drive.google.com/drive/folders/YOUR_FOLDER_ID
What happens:
- Markdown mode: Uploads .md files
- Docx mode (default): Copies original Google Docs (preserves all formatting)
- Index CSV saved locally with Drive file IDs
First time: Delete token.pickle to re-authenticate with Drive write permissions.
- "Missing credentials.json": Run setup script again
- "Access denied": Use
--request-accessto auto-request access from doc owners - Reset auth: Delete
token.pickleand run again - "uv: command not found": Close/reopen terminal, run setup again
setup.sh/setup.bat- One-click setup scriptsmain.py- The scriptcredentials.json- OAuth app credentials (you provide, safe to share with colleagues)token.pickle- Your personal auth tokens (auto-generated, never share)exported_docs/- Output folderexported_docs/index.csv- Document index (auto-generated)
Markdown export doesn't support: images, comments, complex table formatting, footnotes, drawings. Use --format docx for perfect formatting.
See LLM.md for technical overview and architecture.