-
Notifications
You must be signed in to change notification settings - Fork 8
get_doc() in Loop for Sync and Daily Tasks #45
Copy link
Copy link
Open
Labels
Description
Metadata
- File(s):
frappe_gmail_thread/tasks/sync.py:12frappe_gmail_thread/tasks/daily.py:20
- Category: Database / Scheduled Tasks
- Severity: Medium
- Effort to Fix: Low
- Estimated Performance Gain: 20-40%
Problem Description
Both scheduled tasks (sync_emails and enable_pubsub_everyday) fetch Gmail Account names with get_all(), then immediately load the full document with get_doc() in a loop. This creates unnecessary database queries when only specific fields are needed.
The sync_emails task runs every 30 minutes, and enable_pubsub_everyday runs daily. With many Gmail accounts, these N+1 query patterns add up.
Code Location
sync.py (line 12):
def sync_emails():
gmail_accounts = frappe.get_all(
"Gmail Account",
filters={"gmail_enabled": 1},
fields=["name"], # Only fetches name
)
for gmail_account in gmail_accounts:
gaccount = frappe.get_doc("Gmail Account", gmail_account.name) # Full doc load
if gaccount.refresh_token:
user = gaccount.linked_user
# ...daily.py (line 20):
def enable_pubsub_everyday():
# ...
gmail_accounts = frappe.get_all(
"Gmail Account", filters={"gmail_enabled": 1}, fields=["name"]
)
for gmail_account in gmail_accounts:
gaccount = frappe.get_doc("Gmail Account", gmail_account.name) # Full doc load
try:
enable_pubsub(gaccount)
# ...Root Cause
The pattern of:
- Fetch list with minimal fields
- Loop and load full document for each item
This is inefficient because:
- Initial query returns only
name - Each
get_doc()requires a separate database round-trip - Full document loads all fields, child tables, and runs controller logic
Proposed Solution
For sync.py - Only need refresh_token and linked_user:
def sync_emails():
gmail_accounts = frappe.get_all(
"Gmail Account",
filters={"gmail_enabled": 1},
fields=["name", "refresh_token", "linked_user"], # Fetch needed fields
)
for gmail_account in gmail_accounts:
if gmail_account.refresh_token:
user = gmail_account.linked_user
job_name = f"gmail_thread_sync_{user}"
if not is_job_enqueued(job_name):
frappe.enqueue(
"frappe_gmail_thread.frappe_gmail_thread.doctype.gmail_thread.gmail_thread.sync",
user=user,
queue="long",
job_name=job_name,
job_id=job_name,
)For daily.py - enable_pubsub needs full doc, but can batch check first:
If enable_pubsub() truly needs the full document, at minimum filter accounts that have the necessary tokens:
def enable_pubsub_everyday():
google_settings = frappe.get_single("Google Settings")
if not google_settings.enable:
return
if (
not google_settings.custom_gmail_sync_in_realtime
or not google_settings.custom_gmail_pubsub_topic
):
return
# Only fetch accounts with refresh tokens (they're the only ones that can use pubsub)
gmail_accounts = frappe.get_all(
"Gmail Account",
filters={
"gmail_enabled": 1,
"refresh_token": ["is", "set"] # Filter out accounts without tokens
},
fields=["name"]
)
for gmail_account in gmail_accounts:
gaccount = frappe.get_doc("Gmail Account", gmail_account.name)
try:
enable_pubsub(gaccount)
except Exception as e:
frappe.log_error(title="PubSub Error", message=e)
continueImplementation Steps
-
sync.py:
- Update
fieldsto include["name", "refresh_token", "linked_user"] - Remove
get_doc()call - Access fields directly from the dict returned by
get_all()
- Update
-
daily.py:
- Add filter for
refresh_tokento reduce iterations - If
enable_pubsub()can be refactored to accept specific fields, do so - Otherwise, the filter optimization still reduces unnecessary iterations
- Add filter for
-
Test both scheduled tasks with multiple Gmail accounts
Additional Notes
- The
sync()function called byenqueue()does need the full document, but that's inside the background job - The scheduler only needs to check if tokens exist and get the user email
- This pattern appears across multiple rtCamp repos (frappe-slack-connector, frappe-github-connector)
Reactions are currently unavailable