get_doc() in Loop for Sync and Daily Tasks


## Metadata
- **File(s)**: 
  - `frappe_gmail_thread/tasks/sync.py:12`
  - `frappe_gmail_thread/tasks/daily.py:20`
- **Category**: Database / Scheduled Tasks
- **Severity**: Medium
- **Effort to Fix**: Low
- **Estimated Performance Gain**: 20-40%

## Problem Description

Both scheduled tasks (`sync_emails` and `enable_pubsub_everyday`) fetch Gmail Account names with `get_all()`, then immediately load the full document with `get_doc()` in a loop. This creates unnecessary database queries when only specific fields are needed.

The `sync_emails` task runs every 30 minutes, and `enable_pubsub_everyday` runs daily. With many Gmail accounts, these N+1 query patterns add up.

## Code Location

**sync.py (line 12):**
```python
def sync_emails():
    gmail_accounts = frappe.get_all(
        "Gmail Account",
        filters={"gmail_enabled": 1},
        fields=["name"],  # Only fetches name
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)  # Full doc load
        if gaccount.refresh_token:
            user = gaccount.linked_user
            # ...
```

**daily.py (line 20):**
```python
def enable_pubsub_everyday():
    # ...
    gmail_accounts = frappe.get_all(
        "Gmail Account", filters={"gmail_enabled": 1}, fields=["name"]
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)  # Full doc load
        try:
            enable_pubsub(gaccount)
        # ...
```

## Root Cause

The pattern of:
1. Fetch list with minimal fields
2. Loop and load full document for each item

This is inefficient because:
- Initial query returns only `name`
- Each `get_doc()` requires a separate database round-trip
- Full document loads all fields, child tables, and runs controller logic

## Proposed Solution

### For sync.py - Only need `refresh_token` and `linked_user`:

```python
def sync_emails():
    gmail_accounts = frappe.get_all(
        "Gmail Account",
        filters={"gmail_enabled": 1},
        fields=["name", "refresh_token", "linked_user"],  # Fetch needed fields
    )
    for gmail_account in gmail_accounts:
        if gmail_account.refresh_token:
            user = gmail_account.linked_user
            job_name = f"gmail_thread_sync_{user}"
            if not is_job_enqueued(job_name):
                frappe.enqueue(
                    "frappe_gmail_thread.frappe_gmail_thread.doctype.gmail_thread.gmail_thread.sync",
                    user=user,
                    queue="long",
                    job_name=job_name,
                    job_id=job_name,
                )
```

### For daily.py - `enable_pubsub` needs full doc, but can batch check first:

If `enable_pubsub()` truly needs the full document, at minimum filter accounts that have the necessary tokens:

```python
def enable_pubsub_everyday():
    google_settings = frappe.get_single("Google Settings")
    if not google_settings.enable:
        return
    if (
        not google_settings.custom_gmail_sync_in_realtime
        or not google_settings.custom_gmail_pubsub_topic
    ):
        return

    # Only fetch accounts with refresh tokens (they're the only ones that can use pubsub)
    gmail_accounts = frappe.get_all(
        "Gmail Account", 
        filters={
            "gmail_enabled": 1,
            "refresh_token": ["is", "set"]  # Filter out accounts without tokens
        }, 
        fields=["name"]
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)
        try:
            enable_pubsub(gaccount)
        except Exception as e:
            frappe.log_error(title="PubSub Error", message=e)
            continue
```

## Implementation Steps

1. **sync.py**:
   - Update `fields` to include `["name", "refresh_token", "linked_user"]`
   - Remove `get_doc()` call
   - Access fields directly from the dict returned by `get_all()`

2. **daily.py**:
   - Add filter for `refresh_token` to reduce iterations
   - If `enable_pubsub()` can be refactored to accept specific fields, do so
   - Otherwise, the filter optimization still reduces unnecessary iterations

3. Test both scheduled tasks with multiple Gmail accounts

## Additional Notes

- The `sync()` function called by `enqueue()` does need the full document, but that's inside the background job
- The scheduler only needs to check if tokens exist and get the user email
- This pattern appears across multiple rtCamp repos (frappe-slack-connector, frappe-github-connector)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_doc() in Loop for Sync and Daily Tasks #45

Metadata

Problem Description

Code Location

Root Cause

Proposed Solution

For sync.py - Only need `refresh_token` and `linked_user`:

For daily.py - `enable_pubsub` needs full doc, but can batch check first:

Implementation Steps

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

get_doc() in Loop for Sync and Daily Tasks #45

Description

Metadata

Problem Description

Code Location

Root Cause

Proposed Solution

For sync.py - Only need refresh_token and linked_user:

For daily.py - enable_pubsub needs full doc, but can batch check first:

Implementation Steps

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

For sync.py - Only need `refresh_token` and `linked_user`:

For daily.py - `enable_pubsub` needs full doc, but can batch check first: