Skip to content

further improve overhead of database calls from dequeue() #262

@carmichaelong

Description

@carmichaelong

Some of the calls to the database could be fetching extra info from the database that is not necessary when initially filtering just a few trials from the larger database. We could consider using only() to initially only pull relevant columns for the larger filter, then let django do lazy loading of other info needed later (at the cost of extra, but smaller calls to the DB).

Example

for instance, this django code

# find trials with some videos not uploaded
not_uploaded = Video.objects.filter(video='',
                                    updated_at__gte=datetime.now() + timedelta(minutes=-15)).values_list("trial__id", flat=True)


if isMonoQuery == 'False':
    uploaded_trials = Trial.objects.filter(updated_at__gte=datetime.now() + timedelta(days=-7)).exclude(
                                            id__in=not_uploaded).exclude(session__isMono=True)
else:
    uploaded_trials = Trial.objects.filter(updated_at__gte=datetime.now() + timedelta(days=-7)).exclude(
                                            id__in=not_uploaded).filter(session__isMono=True)

leads to the SQL call

SELECT "mcserver_trial"."id", "mcserver_trial"."session_id", "mcserver_trial"."status", 
       "mcserver_trial"."name", "mcserver_trial"."meta", "mcserver_trial"."created_at", 
       "mcserver_trial"."updated_at", "mcserver_trial"."server", "mcserver_trial"."is_docker", 
       "mcserver_trial"."hostname", "mcserver_trial"."processed_duration", 
       "mcserver_trial"."processed_count", "mcserver_trial"."git_commit", 
       "mcserver_trial"."trashed", "mcserver_trial"."trashed_at"
FROM "mcserver_trial"
INNER JOIN "mcserver_session" 
    ON ("mcserver_trial"."session_id" = "mcserver_session"."id")
WHERE ("mcserver_trial"."updated_at" >= '2025-06-02T00:00:00+00:00'::timestamptz 
       AND NOT ("mcserver_trial"."id" IN (
            SELECT U0."trial_id" FROM "mcserver_video" U0 
            WHERE (U0."updated_at" >= '2025-06-09T00:00:00+00:00'::timestamptz 
                   AND U0."video" = '')
       )) 
       AND NOT ("mcserver_session"."isMono"));

The long SELECT call is likely due to some of the fields needed after a single trial is selected to update these fields.

Steps to try:

  • use only() to reduce the first SELECT call (e.g., id, updated_at, isMono)
  • let django lazy load some fields (e.g., status, server, processed_count)
  • check the new SQL calls from django to see if it reduces overhead using EXPLAIN ANALYZE
  • see if extra calls from the database are worth reducing the larger initial filter

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions