A robust Python application that syncs Gmail messages to a local SQLite database for analysis and archival purposes.
- Incremental Sync: Only downloads new messages by default
- Full Sync: Option to download all messages and detect deletions
- Parallel Processing: Multi-threaded message fetching for improved performance
- Robust Error Handling: Automatic retries with exponential backoff
- Graceful Shutdown: Handles interruption signals cleanly
- Type Safety: Comprehensive type hints throughout the codebase
- Python 3.8 or higher
- Google Cloud Project with Gmail API enabled
- OAuth 2.0 credentials file (
credentials.json)
-
Clone the repository:
git clone https://github.com/marcboeker/gmail-to-sqlite.git cd gmail-to-sqlite -
Install dependencies:
# Using uv uv sync -
Set up Gmail API credentials:
- Go to the Google Cloud Console
- Create a new project or select an existing one
- Enable the Gmail API
- Create OAuth 2.0 credentials (Desktop application)
- Download the credentials file and save it as
credentials.jsonin the project root
You can run the application using either python directly or via uv:
# Incremental sync (default)
python main.py sync --data-dir ./data
# or: uv run main.py sync --data-dir ./data
# Full sync with deletion detection
python main.py sync --data-dir ./data --full-sync
# Sync a specific message
python main.py sync-message --data-dir ./data --message-id MESSAGE_ID
# Detect and mark deleted messages only
python main.py sync-deleted-messages --data-dir ./data
# Use custom number of worker threads
python main.py sync --data-dir ./data --workers 8
# Get help
python main.py --help
python main.py sync --helpcommand: Required. One ofsync,sync-message, orsync-deleted-messages--data-dir: Required. Directory where the SQLite database will be stored--full-sync: Optional. Forces a complete sync of all messages--message-id: Required forsync-message. The ID of a specific message to sync--workers: Optional. Number of worker threads (default: number of CPU cores)--help: Show help information for commands and options
The application supports graceful shutdown when you press CTRL+C:
- Stops accepting new tasks
- Waits for currently running tasks to complete
- Saves progress of completed work
- Exits cleanly
Pressing CTRL+C a second time will force an immediate exit.
The application creates a SQLite database with the following schema:
| Field | Type | Description |
|---|---|---|
| message_id | TEXT | Unique Gmail message ID |
| thread_id | TEXT | Gmail thread ID |
| sender | JSON | Sender information (name, email) |
| recipients | JSON | Recipients by type (to, cc, bcc) |
| labels | JSON | Array of Gmail labels |
| subject | TEXT | Message subject |
| body | TEXT | Message body (plain text) |
| size | INTEGER | Message size in bytes |
| timestamp | DATETIME | Message timestamp |
| is_read | BOOLEAN | Read status |
| is_outgoing | BOOLEAN | Whether sent by user |
| is_deleted | BOOLEAN | Whether deleted from Gmail |
| last_indexed | DATETIME | Last sync timestamp |
SELECT sender->>'$.email', COUNT(*) AS count
FROM messages
GROUP BY sender->>'$.email'
ORDER BY count DESCThis is great to determine who is spamming you the most with uninteresting emails.
SELECT sender->>'$.email', COUNT(*) AS count
FROM messages
WHERE is_read = 0
GROUP BY sender->>'$.email'
ORDER BY count DESC- For years:
strftime('%Y', timestamp) - For months in a year:
strftime('%m', timestamp) - For days in a month:
strftime('%d', timestamp) - For weekdays:
strftime('%w', timestamp) - For hours in a day:
strftime('%H', timestamp)
SELECT strftime('%Y', timestamp) AS period, COUNT(*) AS count
FROM messages
GROUP BY period
ORDER BY count DESCThis is an amateurish way to find all newsletters and group them by sender. It's not perfect, but it's a start. You could also use
SELECT sender->>'$.email', COUNT(*) AS count
FROM messages
WHERE body LIKE '%newsletter%' OR body LIKE '%unsubscribe%'
GROUP BY sender->>'$.email'
ORDER BY count DESCSELECT sender->>'$.email', sum(size)/1024/1024 AS size
FROM messages
GROUP BY sender->>'$.email'
ORDER BY size DESCSELECT count(*)
FROM messages
WHERE EXISTS (
SELECT 1
FROM json_each(messages.recipients->'$.to')
WHERE json_extract(value, '$.email') = 'foo@example.com'
)
AND sender->>'$.email' = 'foo@example.com'SELECT sender->>'$.email', sum(size)/1024/1024 as total_size
FROM messages
WHERE is_outgoing=false
GROUP BY sender->>'$.email'
ORDER BY total_size DESCSELECT message_id, subject, timestamp
FROM messages
WHERE is_deleted=1
ORDER BY timestamp DESC