Automatically download bank/financial statement PDFs from Gmail — config-driven, deduplication built-in, dual IMAP/OAuth support.
自動從 Gmail 下載銀行對帳單 PDF,設定驅動、內建去重、支援 IMAP 與 OAuth 雙模式。
Requires Python 3.9+ · Part of the notoriouslab open-source toolkit.
任何 AI agent 框架都可以透過 shell 呼叫,附帶 SKILL.md 讓 OpenClaw 直接整合使用。
Most Gmail-based statement tools are one-off scripts tied to a single bank. This one is config-driven: add any bank without touching code. 大多數 Gmail 對帳單工具都是單一銀行的臨時腳本。這個工具是設定驅動的:新增任何銀行都不需要修改程式碼。
| gmail-statement-fetcher | |
|---|---|
| Multi-bank / 多銀行 | ✅ JSON config, no code changes |
| Deduplication / 去重 | ✅ UID-based, never re-downloads |
| IMAP (headless) | ✅ stdlib only, zero install |
| OAuth 2.0 | ✅ gmail.readonly scope |
| ZIP extraction / 解壓縮 | ✅ stdlib, ZIP bomb–protected |
| PDF decryption / PDF 解密 | ✅ optional pikepdf |
| Normalized filenames | ✅ 永豐銀行_信用卡對帳單_2026_02.pdf |
| Dry-run preview | ✅ --dry-run |
| Atomic PDF writes | ✅ tmpfile + os.replace() |
| Privacy-safe dedup | ✅ subject hashes, no PII stored |
| Security hardened | ✅ token 0o600, log sanitisation, username masked |
- Dual auth / 雙重認證: IMAP + App Password (headless servers) or OAuth 2.0
gmail.readonly(personal use) - Config-driven rules / 設定驅動規則: add any bank without touching code — sender keywords, subject keywords, doc type rules all in JSON
- Normalized filenames / 標準化檔名:
永豐銀行_信用卡對帳單_2026_02.pdf— readable, sortable, dedup-friendly - Smart date extraction / 智慧日期擷取: extracts statement period from subject line (e.g.
2026年2月) before falling back to email Date header - Deduplication / 去重機制: UID-based, never downloads the same email twice; pruned automatically after
retention_days - ZIP extraction / ZIP 解壓縮: stdlib
zipfile, supports per-bankzip_password, ZIP bomb–protected (100 MB cap) - PDF decryption / PDF 解密: optional
pikepdf; supports per-bankpdf_password; skips gracefully if pikepdf not installed - Dry-run mode / 預覽模式:
--dry-runshows matched emails and filenames without writing anything - Atomic writes / 原子寫入: all PDF saves use
tempfile.mkstemp+os.replace()— no partial files on interruption - Privacy-safe dedup / 隱私安全去重:
.processed_uids.jsonstores subject hashes (SHA-256), not raw email subjects - Security hardened / 安全強化:
token.jsonsaved at0o600, Gmail username masked in logs, sender domain boundary matching, log injection stripped, ZIP decompression capped at 100 MB, IMAP socket timeout (300s), config password warning - Zero stdlib-only for IMAP mode: no
pip installneeded for basic use
# 1. Clone
git clone https://github.com/notoriouslab/gmail-statement-fetcher.git
cd gmail-statement-fetcher
# 2. Copy and edit config / 複製並編輯設定
cp config.example.json config.json
# Edit config.json — add your bank's sender domain and subject keywords
# 編輯 config.json — 新增你銀行的寄件人網域與主旨關鍵字
# 3. Set credentials / 設定認證資訊
cp .env.example .env
# Edit .env — fill in GMAIL_USER and GMAIL_APP_PASSWORD
# 編輯 .env — 填入 GMAIL_USER 和 GMAIL_APP_PASSWORD
# 4a. IMAP mode — no extra install needed / IMAP 模式,無需額外安裝
pip install python-dotenv # optional but recommended / 選裝,裝了就不用手動 export
python3 fetcher.py
# 4b. OAuth mode — install dependencies first / OAuth 模式,先安裝依賴
pip install google-auth-oauthlib google-api-python-client python-dotenv
# Then set AUTH_METHOD=oauth in .env, place credentials.json in the project root
# 在 .env 設定 AUTH_METHOD=oauth,並將 credentials.json 放在專案根目錄
python3 fetcher.py
# Output / 輸出: ./downloads/永豐銀行_銀行對帳單_2026_02.pdfPreview matched emails without downloading / 預覽匹配信件不下載:
python3 fetcher.py --dry-run --verbose適合 cron 排程、無頭伺服器,無需瀏覽器,純標準函式庫。
- Enable 2FA on your Google account / 啟用 Google 帳號的兩步驗證
- Go to Google Account → Security → App Passwords / 前往安全性 → 應用程式密碼
- Create an App Password for "Mail" / 為「郵件」建立應用程式密碼
- Set in
.env:AUTH_METHOD=imap,GMAIL_USER,GMAIL_APP_PASSWORD
使用 gmail.readonly 最小權限範圍,更安全但需要第一次瀏覽器授權。
- Create a project in Google Cloud Console / 在 Google Cloud Console 建立專案
- Enable the Gmail API / 啟用 Gmail API
- Create OAuth credentials (Desktop app) → download
credentials.json→ place it in the project root / 建立 OAuth 憑證(桌面應用程式)→ 下載credentials.json→ 放在專案根目錄 - Install dependencies / 安裝依賴:
pip install google-auth-oauthlib google-api-python-client - Set
AUTH_METHOD=oauthin.env - First run opens a browser for authorization → generates
token.json/ 第一次執行會開啟瀏覽器授權 → 產生token.json
Headless servers / 無頭伺服器: After the first OAuth run on a local machine, copy
token.jsonto your server and setOAUTH_TOKEN=/path/to/token.json. Keep this file backed up — losing it requires re-authorization.在本機完成第一次 OAuth 授權後,將
token.json複製到伺服器,並設定OAUTH_TOKEN=/path/to/token.json。請備份此檔案,遺失後需重新授權。
Bank key naming / 銀行 key 命名: Keys starting with
_(e.g._example_en) are ignored by the fetcher — use this for disabled or template entries. 以_開頭的 key(如_example_en)會被忽略,可用於停用或範本條目。 Seeconfig.example.jsonfor ready-to-use Taiwan bank configs / 參見config.example.json內含現成的台灣銀行設定。
Filename format / 檔名格式: {short_name}_{doc_type}_{YYYY}_{MM}.pdf
Month is always zero-padded (_02_ not _2_). subject_date_pattern captures raw digits; the fetcher normalises them automatically.
月份固定補零(_02_ 而非 _2_)。subject_date_pattern 擷取原始數字,程式自動補零。
Examples / 範例:
永豐銀行_銀行對帳單_2026_02.pdf永豐銀行_信用卡對帳單_2026_02.pdfMyBank_CreditCard_2026_02.pdf
Note: Each
short_namemust be unique across banks, as it is used as the filename prefix. If a bank re-sends a statement (e.g. a correction), the replacement email has a different UID and will be downloaded again — this is intentional.注意:每個
short_name必須唯一,因為它用作檔名前綴。 若銀行補發對帳單(例如更正版),補發郵件的 UID 不同,會被重新下載 — 此為預期行為。
python fetcher.py [options]
--config path to config JSON (default: <script dir>/config.json)
設定檔路徑
--output-dir directory to save PDFs (default: <script dir>/downloads)
PDF 儲存目錄
--state-file path to UID dedup store JSON (default: <output-dir>/.processed_uids.json)
UID 去重狀態檔路徑(output-dir 唯讀時使用此選項)
--auth imap | oauth (overrides AUTH_METHOD env var)
認證方式(覆蓋 .env 中的 AUTH_METHOD)
--dry-run preview matched emails/filenames without downloading anything
預覽匹配結果,不實際下載
--verbose enable debug logging
啟用除錯日誌
--version print version and exit
顯示版本並退出
Recommended: install python-dotenv / 推薦:安裝 python-dotenv
pip install python-dotenvThe fetcher calls load_dotenv() automatically — no manual export needed in cron.
安裝後,fetcher.py 會自動讀取 .env,cron 裡不需要手動 export。
# Run daily at 09:00 (Linux/macOS) / 每天 09:00 自動執行
0 9 * * * cd /path/to/gmail-statement-fetcher && python3 fetcher.pyWithout python-dotenv / 未安裝 python-dotenv
⚠️ export $(cat .env | xargs)breaks when values contain spaces,$, or#. Use a wrapper script instead / 請改用包裝腳本:
#!/bin/bash
# run_fetcher.sh
set -a
# shellcheck source=.env
source "$(dirname "$0")/.env"
set +a
exec python3 "$(dirname "$0")/fetcher.py" "$@"# crontab / 排程
0 9 * * * /path/to/gmail-statement-fetcher/run_fetcher.shsource .env honours shell quoting, so passwords with $, spaces, or # are safe.
source .env 遵守 shell 引用規則,密碼包含 $、空格或 # 都不會出問題。
For Oracle/headless servers using OAuth, set OAUTH_TOKEN to the full path of token.json.
使用 OAuth 的無頭伺服器請設定 OAUTH_TOKEN 指向 token.json 的完整路徑。
Some banks deliver statements as password-protected ZIPs or PDFs. 部分銀行會以密碼保護的 ZIP 或 PDF 寄送對帳單。
ZIP (stdlib, no install needed / 無需額外安裝):
"zip_password": "your-zip-password"PDF decryption (requires pikepdf / 需安裝 pikepdf):
pip install pikepdf~=9.0"pdf_password": "your-pdf-password"If pikepdf is not installed and a pdf_password is set, the encrypted PDF is saved as-is with a warning.
若未安裝 pikepdf 但設定了 pdf_password,加密 PDF 會照原樣儲存並顯示警告。
⚠️ config.jsoncontains passwords — do NOT commit it to git. It is already listed in.gitignore. Onlyconfig.example.json(no real passwords) should be version-controlled.
⚠️ config.json包含密碼,請勿 commit 到 git。 此檔案已列入.gitignore。只有config.example.json(無真實密碼)應該進版本控制。
token.jsonis saved with0o600permissions / 以0o600權限儲存- PDF writes use atomic write (
tmpfile+os.replace) — no partial files / PDF 原子寫入,不會產生半殘檔案 .processed_uids.jsonstores subject hashes (SHA-256), not raw subjects / 去重記錄只存主旨雜湊,不存原始主旨- Gmail username masked in logs (first 3 chars +
***) / 日誌中 Gmail 帳號只顯示前 3 字元 - Sender matching uses
@/.domain boundary to reduce false positives / 寄件人比對使用域名邊界防止誤匹配 - IMAP socket timeout (300s) prevents indefinite hangs / IMAP 連線逾時防止無限等待
- ZIP decompression capped at 100 MB to prevent ZIP bombs / ZIP 解壓縮上限 100 MB
- Email subjects are sanitised before logging to prevent log injection / 主旨記錄前清除控制字元
- Startup warns if
pdf_password/zip_passwordfound inconfig.json/ 啟動時警告 config 中的明文密碼
See SECURITY.md for the full security policy. 完整安全政策請見 SECURITY.md。
gmail-statement-fetcher → PDF downloads / PDF 下載
↓
doc-cleaner → PDF/DOCX/XLSX → structured Markdown / 結構化 Markdown
↓
personal-cfo → monthly audit + retirement glide path / 月度審計 + 退休滑翔路徑
Each tool works standalone. Together they form a full personal finance automation pipeline. 每個工具可獨立使用。合併使用則構成完整的個人財務自動化流水線。
The easiest contribution is adding a bank config entry. See CONTRIBUTING.md. 最簡單的貢獻方式是新增銀行設定條目,詳見 CONTRIBUTING.md。
MIT
{ "banks": { "my_bank": { "name": "My Bank", // display name / 顯示名稱 "short_name": "MyBank", // used in filename / 用於檔名前綴 "imap_search": { "sender_keywords": ["mybank.com"], // match From header / 比對寄件人 "subject_keywords": ["e-Statement"], // AND logic with sender / 與寄件人 AND "exclude_attachment_patterns": ["terms"] // skip matching attachments / 跳過匹配附件 }, "doc_type_rules": [ // first match wins / 第一個匹配優先 {"keyword": "credit card", "type": "CreditCard"}, {"keyword": "e-Statement", "type": "BankStatement"} ], "default_doc_type": "Statement", // fallback / 預設類型 "subject_date_pattern": "(\\d{4})年(\\d{1,2})月", // regex for YYYY/MM / 擷取日期用 regex "pdf_password": "", // optional / 選填,PDF 密碼 "zip_password": "" // optional / 選填,ZIP 密碼 } }, "global_settings": { "lookback_days": 60, // scan window / 掃描天數 "retention_days": 180 // dedup record lifetime / 去重記錄保留天數 } }