Skip to content

Fix queue_jobs fetch performance: expression index for ORDER BY and drop unused GIN index #496

@nicolasnoble

Description

@nicolasnoble
  • I have checked the existing issues to avoid duplicates
  • I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue

Describe the bug

This is a follow-up to #390, which identified the slow queue_jobs fetch query but was closed because the proposed fix changed the ORDER BY semantics. This fix preserves the exact ordering while achieving the same performance improvement, plus addresses a second issue with an unused GIN index.

Analysis

1. Queue fetch scans all pending rows due to unindexable ORDER BY

The queue server's handleJobs query (server.go:181) orders by status = 'retry' DESC, priority, run_after, id. The computed boolean expression status = 'retry' prevents Postgres from using any standard B-tree index for ordered retrieval, forcing a full scan + sort of all pending/retry rows.

Before (using queue_jobs_queue_status_idx):

 Limit  (cost=9586.08..9586.09 rows=1 width=724) (actual time=422.197..422.199 rows=1 loops=1)
   Buffers: shared hit=8968
   ->  LockRows  (cost=9586.08..10349.32 rows=61059 width=724) (actual time=422.195..422.197 rows=1 loops=1)
         ->  Sort  (cost=9586.08..9738.73 rows=61059 width=724) (actual time=422.131..422.132 rows=1 loops=1)
               Sort Key: ((status = 'retry'::queue_job_status)) DESC, priority, run_after, id
               Sort Method: quicksort  Memory: 54644kB
               ->  Index Scan using queue_jobs_queue_status_idx on queue_jobs
                     (actual time=0.123..75.084 rows=60384 loops=1)
 Execution Time: 424.739 ms

447ms, 8,970 buffer hits, 54MB in-memory sort of 60k rows. Multiplied by processor.concurrency, this creates significant contention.

The fix is an expression index that includes the computed boolean:

CREATE INDEX queue_jobs_fetch_order_idx
ON queue_jobs (queue, (status = 'retry') DESC, priority, run_after, id)
WHERE status IN ('pending', 'retry');

After:

 Limit  (cost=0.42..0.59 rows=1 width=724) (actual time=0.154..0.155 rows=1 loops=1)
   Buffers: shared hit=5
   ->  LockRows  (cost=0.42..10458.37 rows=60802 width=724) (actual time=0.153..0.153 rows=1 loops=1)
         ->  Index Scan using queue_jobs_fetch_order_idx on queue_jobs
               (actual time=0.114..0.115 rows=1 loops=1)
               Index Cond: ((queue = 'process_torrent'::text) AND (run_after <= now()))
 Execution Time: 0.233 ms

0.23ms, 5 buffer hits. ~2000x improvement. The ordering is preserved exactly.

2. Unused GIN index on (queue, payload)

pg_stat_user_indexes shows the GIN index created in migration 00012 has never been read:

                     indexname                     |  size   | scans
---------------------------------------------------+---------+-------
 queue_jobs_queue_payload_idx                      | 678 MB  |     0
 queue_jobs_id_queue_status_priority_run_after_idx | 9480 kB | 12258
 queue_jobs_pkey                                   | 5896 kB |    29
 queue_jobs_fingerprint_idx                        | 5648 kB |    29
 queue_jobs_queue_status_idx                       | 728 kB  | 29285

678MB, 0 scans. Every queue INSERT/UPDATE/DELETE maintains this index for no benefit.

Impact

After both changes, slow queries (30s+ logged by gorm) dropped to zero and queue processing throughput roughly doubled. I can submit a PR with a goose migration implementing both fixes.

Environment Information (Required)

  • Bitmagnet version: v0.10.0
  • OS and version: Debian Linux (Docker, PostgreSQL 16-alpine)
  • Config overrides: processor.concurrency: 16, 10 DHT crawler nodes, ~6.5M torrents indexed, ~60k pending queue jobs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions